Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :
Glad to help -- I haven't taken a look at Dennis' solution (which may be far
better than mine), but if you do want to keep going down the path outlined
below you might consider the following:
I will try Dennis? solution right away but looked at your suggestions
first. Thank you very much.
Instead of throwing away a simulation if something starts negative, why not
just multiply the entire sample by -1: that lets you still use the sample
and saves you some computations: of course you'll have to remember to adjust
your final results accordingly.
That is a nice suggestion. For a symmetric random walk this is indeed
possible and equivalent to looking when the walk first hits zero.
This might avoid the loop:
x = ## Whatever x is.
xLag = c(0,x[-length(x)]) # 'lag' x by 1 step.
which.max((x>=0) & (xLag <0)) + 1 # Depending on how you've decided to count
things, this +1 may be extraneous.
The inner expression sets a 0 except where there is a switch from negative
to positive and a one there: the which.max function returns the location of
the first maximum, which is the first 1, in the vector. If you are
guaranteed the run starts negative, then the location of the first positive
should give you the length of the negative run.
That is the same idea as from Bill [1]. The problem is, when the walk
never returns to zero in a sample, `which.max(?everything FALSE)`
returns 1 [2]. That is no problem though, when we do not have to worry
about a walk starting with a positive value and adding 1 (+1) can be
omitted when we count the epochs of first hitting 0 instead of the time
of how long the walk stayed negative, which is always one less.
Additionally my check `(x>=0) & (xLag <0)` is redundant when we know we
start with a negative value. `(x>=0)` should be good enough in this
case.
This all gives you,
f4 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row
The line above seems to look the columns instead of rows. I think the
following is correct since after the `apply()` above the random walks
are in the columns.
R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]
xLag = c(0,x[-length(x)])
return(which.max((x>=0) & (xLag <0))+1)
countNegative = apply(R,2,fTemp)
tabulate(as.vector(countNegative), length)
}
That just crashed my computer though, so I wouldn't recommend it for large
n,length.
Welcome to my world. I would have never thought that simulating random
walks with a length of say a million would create that much data and
push common desktop systems with let us say 4 GB of RAM to their limits.
Instead, you can help a little by combining the lagging and the &
all in one.
f4 <- function(n = 100000, llength = 100000)
{
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
R = apply(R,1,cumsum)
R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row
R = (cbind(rep(0,NROW(R)),R)<0)&(cbind(R,rep(0,NROW(R)))>=0)
countNegative = apply(R,1,which.max) + 1
return (tabulate(as.vector(countNegative), length) )
I left that one out, because as written above the check can be
shortened.
Of course, this is all starting to approach a very specific question that
could actually be approached much more efficiently if it's your end goal
(though I think I remember from your first email a different end goal):
That is true. But to learn some optimization techniques on a simple
example is much appreciated and will hopefully help me later on for the
iterated random walk cases.
We can use the symmetry and "restart"ability of RW to do the following:
x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T)
D = diff(which(x == 0))
This will give you a vector of how long x stays positive or negative at a
time. Thinking through some simple translations lets you see that this set
has the same distribution as how long a RW that starts negative stays
negative.
I have to write those translations down. On first sight though we need
again to handle the case where it stays negative the whole time. `D`
then has length 0 and we have to count that for a walk longer than
`BIGNUMBER`.
Again, this is only good for answering a very specific question
about random walks and may not be useful if you have other more complicated
questions in sight.
Just testing for 0 for the iterated cases will not be enough for
iterated random walks since an iterated random walk can go from negative
to non-negative without being zero at this time/epoch.
I implemented all your suggestions and got the following.
-------- 8< -------- code -------- >8 --------
f4 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n)
R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better
fTemp <- function(x) {
if (x[1] >= 0 ) {
return(1)
}
for (i in 1:length-1) {
if (x[i] < 0 && x[i + 1] >= 0) {
return(as.integer(i/2 + 2))
}
}
}
countNegative = apply(R,2,fTemp)
tabulate(as.vector(countNegative), length)
}
f5 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
R = apply(R,1,cumsum)
R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] # If the first element in the row is positive, flip the entire row
R <- R>=0
countNegative = apply(R,2,which.max)
tabulate(as.vector(countNegative), length)
}
f6 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
x = cumsum(sample(c(-1L,1L), length*n,replace=T))
D = diff(which(c(0, x) == 0))
tabulate(D, max(D))
}
-------- 8< -------- code -------- >8 --------
The timings differ quite much which is expected though.
# f1 is using only for loops but only does half the calculations
# and does not yet flip random walks starting with a positive value.
set.seed(1) ; system.time( z1 <- f1(300, 1e5) )
User System verstrichen
2.700 0.008 2.729
# f1 adapted with flips
set.seed(1) ; system.time( z1f <- f1withflip(300, 1e5) )
User System verstrichen
4.457 0.004 4.475
set.seed(1) ; system.time( z4 <- f4(300, 1e5) )
User System verstrichen
8.033 0.380 8.739
set.seed(1) ; system.time( z5 <- f5(300, 1e5) )
User System verstrichen
9.640 0.812 10.588
set.seed(1) ; system.time( z6 <- f6(300, 1e5) )
User System verstrichen
4.208 0.328 4.606
So `f6` seems to be the most efficient setting right now and even is
slightly faster than `f1` with the for loops. But we have to keep in
mind that both operate on different data sets although `set.seed(1)` is
used and `f6` treats the problem totally different.
One other thought is that when reusing the walks starting with a positiv
term and flipping those we can probably also take the backward/reverse
walk (dual problem). I will try that too.
Thank you very much,
Paul
[1] https://stat.ethz.ch/pipermail/r-help/2011-July/285015.html
[2] https://stat.ethz.ch/pipermail/r-help/2011-July/285396.html