Skip to content

Is R the right choice for simulating first passage times of random walks?

1 message · Paul Menzel

#
Dear Dennis and Steve,


Am Sonntag, den 31.07.2011, 23:32 -0400 schrieb Steve Lianoglou:

[?]
thank you very much for your suggestions.

This is indeed a nice speed.

1. I first had that implemented in FORTRAN (and Python) too, but turned
to R for two reasons. First I wanted to use also other distributions
later on and thought that it would be easier with R and that R would
have that implemented as fast as possible. Secondly I thought that R
would also operate faster having the right vectorization and using
`csum()`. But I guess it is difficult to find a good model to use the
advantages of R.

Especially looking at `top` when running this example CPU is used 100 %
but memory only 40 MB from 2 GB. So if one could use another data
structure maybe the calculations could be done on more walks at once.

2. It is indeed possible that the walk never returns to zero, so I
should make sure, that I abort the while loop after a certain length.

3. Looking at the data types I am wondering if some integer overflow(?)
could happen. I could make the length variable unsigned I suppose [1].
But still `csum` could go from `-len` to 0 and for the normal random
walk unsigned should not be a problem too besides that the logic/checks
have to be adapted.

For integrated random walks, `ccsum += csum`, `ccsum` would go from
-(ccsum**2)/2 up to 0. So later on I should use probably the 64 bit data
type (unsigned) `long` for `ccsum`, `csum` and `length` to avoid those
problems. Memory does not seem to be a problem. Also I need to add an
additional check for the height and length in the while loop like the
following.

	(csum < 0) && (csum > -ULONG_MAX) && (len =< ULONG_MAX)

So I came up with the following and to use unsigned I only consider that
the random walk stays positive instead of negative.

-------- 8< -------- code -------- >8 --------
library(inline)
inc <- "
#include <climits>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
"

f9 <-cxxfunction(includes=inc, plugin="Rcpp", body="
  unsigned long len = 1;

  if ((rand() % 2 ) == 0) {
    return wrap(len);
  }

  unsigned long x = 1;

  for (unsigned long csum = x; csum > 0; csum = ((rand() % 2 ) == 0) ? csum + 1: csum - 1) {
    len++;
    if ((csum == ULONG_MAX) && (len == ULONG_MAX)) {
      return wrap(len);
    }
  }

  return wrap(len);
")
-------- 8< -------- code -------- >8 --------

I do not know if the compiler would have optimized it that way anyway
and if there is any difference (besides the overflow checks).
User      System verstrichen 
      0.076       0.004       0.084
[1]       1 1449034
[1] 1000


Thanks,

Paul


[1] https://secure.wikimedia.org/wikipedia/en/wiki/Integer_(computer_science)#Common_integral_data_types
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110802/378b9b6b/attachment.bin>