Is R the right choice for simulating first passage times of random walks?

Mon, Aug 1, 2011 5:49 AM

Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :

I will try Dennis? solution right away but looked at your suggestions
first. Thank you very much.

That is a nice suggestion. For a symmetric random walk this is indeed
possible and equivalent to looking when the walk first hits zero.

That is the same idea as from Bill [1]. The problem is, when the walk
never returns to zero in a sample, `which.max(?everything FALSE)`
returns 1 [2]. That is no problem though, when we do not have to worry
about a walk starting with a positive value and adding 1 (+1) can be
omitted when we count the epochs of first hitting 0 instead of the time
of how long the walk stayed negative, which is always one less.

Additionally my check `(x>=0) & (xLag <0)` is redundant when we know we
start with a negative value. `(x>=0)` should be good enough in this
case.

The line above seems to look the columns instead of rows. I think the
following is correct since after the `apply()` above the random walks
are in the columns.

	R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]

Welcome to my world. I would have never thought that simulating random
walks with a length of say a million would create that much data and
push common desktop systems with let us say 4 GB of RAM to their limits.

I left that one out, because as written above the check can be
shortened.

That is true. But to learn some optimization techniques on a simple
example is much appreciated and will hopefully help me later on for the
iterated random walk cases.

Nice!

I have to write those translations down. On first sight though we need
again to handle the case where it stays negative the whole time. `D`
then has length 0 and we have to count that for a walk longer than
`BIGNUMBER`.

Just testing for 0 for the iterated cases will not be enough for
iterated random walks since an iterated random walk can go from negative
to non-negative without being zero at this time/epoch.

I implemented all your suggestions and got the following.

-------- 8< -------- code -------- >8 --------
f4 <- function(n = 100000, # number of simulations
               length = 100000) # length of iterated sum
{
	R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n)
	R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better
	fTemp <- function(x) {
		if (x[1] >= 0 ) {
			return(1)
		}

		for (i in 1:length-1) {
			if (x[i] < 0 && x[i + 1] >= 0) {
				return(as.integer(i/2 + 2))
			}
		}
	}
	countNegative = apply(R,2,fTemp)
	tabulate(as.vector(countNegative), length)
}

f5 <- function(n = 100000, # number of simulations
               length = 100000) # length of iterated sum
{
	R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)

	R = apply(R,1,cumsum)
	R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] # If the first element in the row is positive, flip the entire row

	R <- R>=0
	countNegative = apply(R,2,which.max)
	tabulate(as.vector(countNegative), length)
}

f6 <- function(n = 100000, # number of simulations
               length = 100000) # length of iterated sum
{
	x = cumsum(sample(c(-1L,1L), length*n,replace=T))
	D = diff(which(c(0, x) == 0))
	tabulate(D, max(D))
}
-------- 8< -------- code -------- >8 --------

The timings differ quite much which is expected though.

User      System verstrichen
      2.700       0.008       2.729

User      System verstrichen 
      4.457       0.004       4.475

User      System verstrichen
      8.033       0.380       8.739

User      System verstrichen
      9.640       0.812      10.588

User      System verstrichen
      4.208       0.328       4.606

So `f6` seems to be the most efficient setting right now and even is
slightly faster than `f1` with the for loops. But we have to keep in
mind that both operate on different data sets although `set.seed(1)` is
used and `f6` treats the problem totally different.

One other thought is that when reusing the walks starting with a positiv
term and flipping those we can probably also take the backward/reverse
walk (dual problem). I will try that too.


Thank you very much,

Paul


[1] https://stat.ethz.ch/pipermail/r-help/2011-July/285015.html
[2] https://stat.ethz.ch/pipermail/r-help/2011-July/285396.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110801/d4411d79/attachment.bin>

Is R the right choice for simulating first passage times of random walks?

Thread (3 messages)