Skip to content

Is R the right choice for simulating first passage times of random walks?

3 messages · Paul Menzel, michael.weylandt at gmail.com (R. Michael Weylandt

#
Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :
I will try Dennis? solution right away but looked at your suggestions
first. Thank you very much.
That is a nice suggestion. For a symmetric random walk this is indeed
possible and equivalent to looking when the walk first hits zero.
That is the same idea as from Bill [1]. The problem is, when the walk
never returns to zero in a sample, `which.max(?everything FALSE)`
returns 1 [2]. That is no problem though, when we do not have to worry
about a walk starting with a positive value and adding 1 (+1) can be
omitted when we count the epochs of first hitting 0 instead of the time
of how long the walk stayed negative, which is always one less.

Additionally my check `(x>=0) & (xLag <0)` is redundant when we know we
start with a negative value. `(x>=0)` should be good enough in this
case.
The line above seems to look the columns instead of rows. I think the
following is correct since after the `apply()` above the random walks
are in the columns.

	R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]
Welcome to my world. I would have never thought that simulating random
walks with a length of say a million would create that much data and
push common desktop systems with let us say 4 GB of RAM to their limits.
I left that one out, because as written above the check can be
shortened.
That is true. But to learn some optimization techniques on a simple
example is much appreciated and will hopefully help me later on for the
iterated random walk cases.
Nice!
I have to write those translations down. On first sight though we need
again to handle the case where it stays negative the whole time. `D`
then has length 0 and we have to count that for a walk longer than
`BIGNUMBER`.
Just testing for 0 for the iterated cases will not be enough for
iterated random walks since an iterated random walk can go from negative
to non-negative without being zero at this time/epoch.

I implemented all your suggestions and got the following.

-------- 8< -------- code -------- >8 --------
f4 <- function(n = 100000, # number of simulations
               length = 100000) # length of iterated sum
{
	R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n)
	R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better
	fTemp <- function(x) {
		if (x[1] >= 0 ) {
			return(1)
		}

		for (i in 1:length-1) {
			if (x[i] < 0 && x[i + 1] >= 0) {
				return(as.integer(i/2 + 2))
			}
		}
	}
	countNegative = apply(R,2,fTemp)
	tabulate(as.vector(countNegative), length)
}

f5 <- function(n = 100000, # number of simulations
               length = 100000) # length of iterated sum
{
	R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)

	R = apply(R,1,cumsum)
	R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] # If the first element in the row is positive, flip the entire row

	R <- R>=0
	countNegative = apply(R,2,which.max)
	tabulate(as.vector(countNegative), length)
}

f6 <- function(n = 100000, # number of simulations
               length = 100000) # length of iterated sum
{
	x = cumsum(sample(c(-1L,1L), length*n,replace=T))
	D = diff(which(c(0, x) == 0))
	tabulate(D, max(D))
}
-------- 8< -------- code -------- >8 --------

The timings differ quite much which is expected though.
User      System verstrichen
      2.700       0.008       2.729
User      System verstrichen 
      4.457       0.004       4.475
User      System verstrichen
      8.033       0.380       8.739
User      System verstrichen
      9.640       0.812      10.588
User      System verstrichen
      4.208       0.328       4.606

So `f6` seems to be the most efficient setting right now and even is
slightly faster than `f1` with the for loops. But we have to keep in
mind that both operate on different data sets although `set.seed(1)` is
used and `f6` treats the problem totally different.

One other thought is that when reusing the walks starting with a positiv
term and flipping those we can probably also take the backward/reverse
walk (dual problem). I will try that too.


Thank you very much,

Paul


[1] https://stat.ethz.ch/pipermail/r-help/2011-July/285015.html
[2] https://stat.ethz.ch/pipermail/r-help/2011-July/285396.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110801/d4411d79/attachment.bin>
I've only got a 20 minute layover, but three quick remarks:

1) Do a sanity check on your data size: if you want a million walks of a thousand steps, that already gets you to a billion integers to store--even at a very low bound of one byte each, thats already 1GB for the data and you still have to process it all and run the OS. If you bump this to walks of length 10k, you are in big trouble. 

Considered like that, it shouldn't surprise you that you are getting near memory limits. 

If you really do need such a large simulation and are willing to make the time/space tradeoff, it may be worth doing simulations in smaller batches (say 50-100) and aggregating the needed stats for analysis. Also, consider direct use of the rm() function for memory management. 

2) If you know that which.max()==1 can't happen for your data, might this trick be easier than forcing it through some tricky logic inside the which.max()

X=which.max(...)
if(X[1]==1) X=Inf # or whatever value

3) I dont have any texts at hand to confirm this but isn't the expected value of the first hit time of a RW infinite? I think a  handwaving proof can be squeezed out of the optional stopping theorem with T=min(T_a,T_b) for a<0<b and let a -> -Inf. 

If I remember right, this suggests you are trying to calculate a CI for a distribution with no finite moments, a difficult task to say the least. 

Hope these help and I'll write a more detailed reply to your notes below later,

Michael Weylandt

PS - what's an iterated RW? This is all outside my field (hence my spitball on #2 above)

PS2 - sorry about the row/column mix-up: I usually think of sample paths as rows...
On Aug 1, 2011, at 8:49 AM, Paul Menzel <paulepanter at users.sourceforge.net> wrote:

            
#
Am Montag, den 01.08.2011, 12:43 -0400 schrieb R. Michael Weylandt :
I already did that, saved the result and ran it again. I also found [1]
and will look to do these things in parallel, since the simulations do
not depend on each other. I hope I can avoid the matrix then and use
just `replicate()`.
I was looking for such a feature the last days and read to set the
variables to `NULL` to delete them somewhere. Now I found the correct
command. Thank you!
Noted for when I need that again.
That is indeed correct. The generating function of the first hitting
time of zero T? is (g_T?)(s) ? 1/s (1 - \sqrt(1 - s). Therefore

(g_T?)?(s) ? 1/s? (1 - \sqrt(1 - s) + 1/s (2(1 - s))^(-?) ? ? for s ? 1.
Apparently there are several ways to prove that.
I do not know. It scares me. ;-) For the symmetric simple random walk
S_n ? ?_{i=0}^n X_i I want to verify (1).

(1)	n^(-?) ~ p_n ? P(max_{1 ? k ? n} S_n < 0)

a(x) ~ b(x) means that the quotient converges to 1 for x ? ?.

[?]
I am sorry, I meant *integrated* random walk [3][4]. Basically that is
the integral (?area? ? can be negative).

	A_n ? ?_{i=0}^n S_i = ?_{i=0}^n (n - i + 1) X_i

Being 0, S? and X? can always be omitted. So I basically just need to do
one `cumsum()` more over the walks.
No problem at all. I already was confused that it looked differently
(transposed) after the first `apply()`. But it made sense.


Thanks,

Paul


[1] http://www.bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf
[2] http://www.steinsaltz.me.uk/probA/ProbALN13.pdf
[3] http://www-stat.stanford.edu/~amir/preprints/irw.ps
[4] http://arxiv.org/abs/0911.5456
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110802/9ef937bf/attachment.bin>