Skip to content

.C(..., DUP=FALSE) memory costs depending on input size?

5 messages · MarcelK, William Dunlap, Jeff Ryan

#
Hello,

I'm trying to create my own C code for use within R. While optimizing the
code I've noticed that even while only using pointers to get my data to C
the time needed still depends on data (vector) size.

To test this, I've created an empty C function to which I've send vectors
containing various sizes of elements. The time needed for each call is
measured and plotted. I would expect a flat line (a little above y=0) since
the only thing send are pointers. What I do not expect is to see a linear
climbing line when the vector size increases. Initializing the vectors isn't
being measured, only the '.C' call to an empty C function, see below.

Is there anything I'm missing that can explain this input-size dependent
latency? The only reason I can think of is that these vectors are being
copied along the way.

What follows is both the R and C code which I use only for testing and a
plot of both measurements with DUP=TRUE and DUP=FALSE:

(RED: DUP=FALSE, GREEN: DUP=TRUE)
http://www.nabble.com/file/p20368695/CandR.png 


R code:
----------
# sequence from 512 to 2^23 with 2^17 stepsize
a <- seq(512, 2^23, 2^17)
# storage for wall time
h <- length(a); j <- length(a)
for (i in 1:length(a)) {
        x <- as.double(1:a[i])
        y <- as.double(x)
	# system.time()[3] is (actual) wall time
        h[i] <- system.time(.C("commTest", x, y, DUP=FALSE))[3]
        j[i] <- system.time(.C("commTest", x, y, DUP=TRUE))[3]
        x <- 0
        y <- 0
}
# plot:
plot(a, h, type="l", col="red", xlab="Vector Size -->", ylab="Time in
Seconds -->"); lines(a, j, col="green")


C code: 
-----------
#include<R.h>
extern "C" {
	void commTest(double* a, double* b);
}

/*
* Empty function
* Just testing communication costs between R --> C
*/
void commTest(double* a, double* b) {
  /* Do ab-so-lute-ly-nothing.. */
}

System Details:
---------------------
Linux gpu 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 x86_64 GNU/Linux
R version 2.7.1 (2008-06-23)
#
Sorry for spamming, legend with the plot is wrong:

RED: DUP = TRUE
GREEN: DUP = FALSE

Pretty clear from the plot itself, but it's both wrong in the plot header
and in the plot code (just swap 'h' and 'j').
#
Does using NAOK=TRUE in the .C() help?  That would avoid
an NA-scan of the input vectors.

Bill Dunlap
TIBCO Spotfire Inc
wdunlap tibco.com
#
Marcel,

If you are writing the C code from scratch, take a look at either
.Call or .External, as both make no copies of the input objects, and
require no explicit conversion to the underlying storage type
(numeric/integer/etc) within the function call.

An even greater benefit is that you will also have access to the
actual R objects within C.

Jeff
On Thu, Nov 6, 2008 at 2:05 PM, MarcelK <m_kempenaar at planet.nl> wrote:

  
    
#
Thank you for your answer William.

I've tried the parameter NAOK out and this is the result:

(RED: FALSE, GREEN: TRUE)
http://www.nabble.com/file/p20376259/RandC_2.png 

As you can see the green line is almost flat now, at least a lot better then
when disabling this option.

Still there is something causing the line to be not completely flat.
Although I suspect some of this is because of the timing function (is there
any other, more accurate, timing function in R besides system.time()?) it
still looks depending on vector size.

Numeric output:

NAOK=FALSE:
----------------
 [1] 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.011 0.012
[13] 0.012 0.014 0.014 0.015 0.017 0.018 0.019 0.020 0.021 0.021 0.023 0.024
[25] 0.025 0.026 0.027 0.028 0.029 0.030 0.031 0.032 0.033 0.034 0.035 0.036
[37] 0.038 0.038 0.039 0.041 0.042 0.043 0.044 0.045 0.046 0.046 0.048 0.049
[49] 0.050 0.050 0.052 0.053 0.054 0.055 0.056 0.057 0.058 0.060 0.060 0.061
[61] 0.063 0.063 0.065 0.065

NAOK=TRUE:
---------------
 [1] 0.000 0.000 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
[13] 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.003 0.002 0.003 0.002 0.002
[25] 0.003 0.003 0.003 0.003 0.003 0.003 0.004 0.003 0.004 0.003 0.004 0.004
[37] 0.004 0.004 0.004 0.004 0.004 0.004 0.005 0.005 0.005 0.005 0.005 0.005
[49] 0.005 0.006 0.006 0.005 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.007
[61] 0.006 0.007 0.007 0.008

Input sizes:
-------------
 [1]     512  131584  262656  393728  524800  655872  786944  918016 1049088
[10] 1180160 1311232 1442304 1573376 1704448 1835520 1966592 2097664 2228736
[19] 2359808 2490880 2621952 2753024 2884096 3015168 3146240 3277312 3408384
[28] 3539456 3670528 3801600 3932672 4063744 4194816 4325888 4456960 4588032
[37] 4719104 4850176 4981248 5112320 5243392 5374464 5505536 5636608 5767680
[46] 5898752 6029824 6160896 6291968 6423040 6554112 6685184 6816256 6947328
[55] 7078400 7209472 7340544 7471616 7602688 7733760 7864832 7995904 8126976
[64] 8258048

Thanks again for the advice, helped me a lot.

And yes, I will be looking at using .Call() or .External(), but since the
learning curve is a bit steep and I'm currently only making a proof of
concept, .C() will do fine for now. 

(P.S. I posted this using Nabble, so layout in e-mail might be a bit
awkward..)
William Dunlap wrote: