It is very rare for interesting code to contain no bugs at first writing. Hopefully the existence of these bugs will be detected by the code's tests, but why the bug occurs will still require work to establish: debugging.
All debugging involves a cycle of gathering data on the problem, formulating hypotheses about what might be causing the problem and then gathering further data to confirm or refute the hypotheses. Most people's first instinct is to do this by inserting print statements into the code to output the required data. This is only rarely the optimally time-efficient approach, and it is often better to use debugging tools.
These notes cover some tools for debugging R code, and C/C++ code called from R, in particular Mark Bravington's 'debug' package, the 'valgrind' memory error checker and 'gdb' used alone or via 'emacs' or 'DDD'. See also Writing R extensions (section 4).
The notes assume you are running Linux or similar. I used Ubuntu 16.04 (Sept 2019), and modified slightly after some problems with emacs debugging on Ubuntu 20.04.
R provides functions 'debug', 'debugcall', 'trace' and 'traceback' for debugging R functions. See their help files for further information. Rstudio also offers debugging facilities. Personally I prefer Mark Bravington's 'debug' package.
options( repos= c( "https://markbravington.github.io/Rmvb-repo", getOption( "repos")))
and then
install.packages("debug")
.
require(debug)
within R.
mtrace(lm)
lm(rnorm(10)~runif(10))
bp(101)
sets a breakpoint at line 101 of the current function, while
bp(101,FALSE)
turns it off again. The location of breakpoints show up in the source code window.
bp(12,i>6,foo)
sets a breakpoint at line 12 of function foo. Execution will halt there if the condition 'i>6' is met.
?bp
for more.
go()
runs to completion, an error, or the next breakpoint.
mtrace
it or set a breakpoint in it.
qqq()
quits the debugger.
lm
.
There are a few minor issues which you may encounter.
mtrace(foo$bar)
will fail. An easy fix is to make a copy of 'bar', e.g.
foobar <- foo$bar
then
mtrace(foobar)
and simply call 'foobar' from the
D(1)>
prompt using the arguments that would have been supplied to
foo$bar
.
go()
ing to it.
<ctrl> c
causes the source window to stop working properly. The following commands avoid having to restart R to fix this
detach("package:debug",unload=TRUE)
detach("package:tcltk",unload=TRUE)
library(debug)
When writing C code to be called from R it is very rare to get it completely right first time (at least if the code is doing something interesting). Typically even if the code runs first time it will fail some of the tests you use to check it, and you will then need to figure out why. As already mentioned, all debugging involves a cycle of gathering data on the problem, formulating hypotheses about what the problem's cause might be, and then gathering further data to confirm or refute the hypotheses. The temptation is often to gather all data by insertion of
Rprintf
calls into the code. This often works in simple cases, but can be hugely inefficient for more complicated bugs (to the point of insanity in the case of segmentation faults).
The following sections provide information on how to get started with debugging tools for C code called from R. 'valgrind' is used for pinpointing memory errors, while 'gdb' is used for examining running code to help figure out where it is going wrong. 'gdb' can be used from within 'emacs' or in text based mode. The former is generally more useful (unless you are the sort of person who preferes 'vi' to 'emacs'), but takes a little more setting up. Another nice graphical interface for 'gdb' is 'DDD' - this is slightly less effort to set up than gdb via emacs, but does not offer source editing.
There are other debuggers that can be used to debug C called from R. My favourite from an ease of use perpective was 'nemiver', but unfortunately it seems to be no longer maintained and is effectively broken on my system.
If you write C code, you should use valgrind. C's enormous flexibility allows you to accidentally produce serious memory corruption. valgrind is usually the fastest way to find such problems (by several orders of magnitude). It works by emulating your CPU with a bunch of added instrumentation that enables it to spot memory errors (such as reading and writing to un-allocated memory). If your code is segfaulting then try running it again using valgrind. To do this start R using
R -d "valgrind"
(you can pass options to valgrind within the quotes). Then run your code from R and wait for it to report errors. Here is an example error report (the first error for this proramme, which is obviously the one to fix first.)
You can see that I loaded a shared object file 'spardisc.so'into R and then issued the command
.Call("sXWXd",m,w,lt,rt)
to call one of its routines. This routine spat out some output before valgrind detected an error produced by function 'tri_to_cs' at line 98 of 'spardisc.c', which had been in turn called from line 345 in function 'sWXWdij' etc. It seems an integer value has been written where it should not have been.
It is usually good practice to use R's memory allocation and freeing functions (such as 'R_chk_calloc') for memory allocation in code to be called from R (running out of memory is then handled gracefully, for example). However this can sometimes make memory errors harder to find. A good option is to
#define CALLOC
as
R_chk_calloc
or
calloc
and allocate memory only with
CALLOC
, so that you can readily switch the memory allocation routines for debugging (same for
FREE
of course).
It is good practice to check newly written C code with valgrind even if it appears to be running without problem.
The advantage to this approach is the graphical setting of breakpoints combined with the ability to move around your code as you debug it in a convenient way. Once you know how to do this it is very quick to set up, but the first time through it is easy to go wrong, so I have provided very detailed instructons below. The instructions assume that you are producing a shared library and loading it explicitly using 'dyn.load' in R. If you are debugging a package things are basically the same, but you need to install the package from its source tree (i.e. not from a .tar.gz), start emacs from within the 'src' directory and set the symbolic link to R from there too.
R CMD SHLIB foo.c
to produce 'foo.so'. To turn on debugging information and turn off optimization, you probably first want to create 'Makevars' in the '.R' directory of your home directory, containing the line
CFLAGS=-g -O0 -Wall -pedantic
(the last two options are turning on maximum compiler wingeyness). You can just comment out this line with
#
to turn the options off again.
commandArgs()[[1]]
. For me this gave "/usr/local/lib/R/bin/exec/R".
R.home()
. For me this gave "/usr/local/lib/R". Check that 'libRblas.so' is in the 'lib' subdirectory of this path, and locate it if it isn't.
ln -s /usr/local/lib/R/bin/exec/R
. You can use
rm
to remove this later. This step just makes things a bit more convenient.
export LD_LIBRARY_PATH="/usr/local/lib/R/lib:$LD_LIBRARY_PATH"
where the path given is the one to 'libRblas.so' - this is not always needed, but on some setups R will fail to start within emacs without it.
export R_HOME="/usr/local/lib/R"
where the path is the one found above. Alternatively you can wait and issue the command
set env R_HOME /usr/local/lib/R
at the
(gdb)
prompt.
emacs foo.c&
.
<esc>-x gdb
(
<esc>
is the emacs meta key referred to as
M
in the emacs documentation).
gdb -i=mi R
within emacs (it will likely suggest something slightly different related to your source file - just edit its suggestion).
i=mi
is just setting
gdb
up to communicate in the way emacs expects.
(gdb)
prompt I might type
break sXWXd
to break on entry to the routine 'sXWXd', or
break foo.c:1066
to break at line 1066 of 'foo.c'. The breakpoint may be pending at this point. That's fine.
dyn.load("foo.so")
and the C calling code --- in my case
.Call("sXWXd",m,w,lt,rt)
Here are a few tips for using the debugger.
break foo.c:398 if i>11
sets a breakpoint at line 398 of 'foo.c' and tells the debugger to stop there only once 'i' exceeds 11. (emacs displays line numbers, or you can set and clear a breakpoint manually to find them.)
p
at the '(gdb)' prompt. For example:
p x
prints the value of 'x'.
p *x
prints the value pointed to by 'x'.
p *(x+2)@3
prints the 3 values in array 'x' starting at 'x[2]'.
set foo = 1
resets variable 'foo' to 1.
set *(foo + 3) = 1.3
sets
foo[3]
to 1.3.
set *(foo + 3)@3 = { 5, 4, 1 }
sets
foo[3:5]
to the given values.
kill
at the '(gdb)' prompt will kill R. The 'Run' button can then be used to restart it.
The instructions for this are much shorter, but the debugging facilities are entirely text based.
R -d gdb
and then type
R
to start R.
dyn.load("foo.so")
or load you package.
<ctrl> c
when you are ready to set breakpoints.
break function_name
or
break foo.c:123
(to stop at line 123 of 'foo.c').
signal 0
to continue in R.
break 432
to break at line 432 of the current file.
clear 432
clears the preceding breakpoint.
n
to execute next line.
p
for printing as in previous section.
DDD is a graphical front end for gdb (and other debuggers). Use is pretty straightforward.
R -d "ddd"
.
A few use hints: