Speed up or alternative to 'For' loop

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130610/a1c40578/attachment.pl>
Hello,

One way to speed it up is to use a matrix instead of a data.frame. Since 
data.frames can hold data of all classes, the access to their elements 
is slow. And your data is all numeric so it can be hold in a matrix. The 
second way below gave me a speed up by a factor of 50.

system.time({
for (i in 2:nrow(df))
  {if(df$TreeID[i]==df$TreeID[i-1])
   {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1]
   }
  }
})

system.time({
df2 <- data.matrix(df)
for(i in seq_len(nrow(df2))[-1]){
	if(df2[i, "TreeID"] == df2[i - 1, "TreeID"])
		df2[i, "HeightGrowth"] <- df2[i, "Height"] - df2[i - 1, "Height"]
}
})

all.equal(df, as.data.frame(df2))  # TRUE

Hope this helps,

Rui Barradas

Em 10-06-2013 18:28, Trevor Walker escreveu:
I have a For loop that is quite slow and am wondering if there is a faster
option:

df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500))
df$Height <- exp(-0.1 + 0.2*df$Age)
df$HeightGrowth <- NA   #intialize with NA
for (i in 2:nrow(df))
  {if(df$TreeID[i]==df$TreeID[i-1])
   {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1]
   }
  }

Trevor Walker
Email: trevordaviswalker at gmail.com

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

How about

for (ir in unique(df$TreeID)) {
  in.ir <- df$TreeID == ir
  df$HeightGrowth[in.ir] <- cumsum(df$Height[in.ir])
}

Seemed fast enough to me.

In R, it is generally good to look for ways to operate on entire vectors
or arrays, rather than element by element within them. The cumsum()
function does that in this example.

-Don
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 6/10/13 10:28 AM, "Trevor Walker" <trevordaviswalker at gmail.com> wrote:

>I have a For loop that is quite slow and am wondering if there is a faster
>option:
>
>df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500))
>df$Height <- exp(-0.1 + 0.2*df$Age)
>df$HeightGrowth <- NA   #intialize with NA
>for (i in 2:nrow(df))
> {if(df$TreeID[i]==df$TreeID[i-1])
>  {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1]
>  }
> }
>
>Trevor Walker
>Email: trevordaviswalker at gmail.com
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

I have a For loop that is quite slow and am wondering if there is a faster
option:

df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500))
df$Height <- exp(-0.1 + 0.2*df$Age)
df$HeightGrowth <- NA   #intialize with NA
for (i in 2:nrow(df))
{if(df$TreeID[i]==df$TreeID[i-1])
 {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1]
 }
}

Ivoid tests with if(){}e;se(). Use vectorized code, possibly with 'ifelse' but in this case you need a function that does calcualtions within groups.

The ave() function with diff() will do it compactly and efficiently:
df <- data.frame(TreeID=rep(1:5,each=4), Age=rep(seq(1,4,1),5))
df$Height <- exp(-0.1 + 0.2*df$Age)
df$HeightGrowth <- NA   #intialize with NA
df$HeightGrowth <- ave(df$Height, df$TreeID, FUN= function(vec) c(NA, diff(vec)))
df
TreeID Age   Height HeightGrowth
1       1   1 1.105171           NA
2       1   2 1.349859    0.2446879
3       1   3 1.648721    0.2988625
4       1   4 2.013753    0.3650314
5       2   1 1.105171           NA
6       2   2 1.349859    0.2446879
7       2   3 1.648721    0.2988625
8       2   4 2.013753    0.3650314
9       3   1 1.105171           NA
10      3   2 1.349859    0.2446879
11      3   3 1.648721    0.2988625
12      3   4 2.013753    0.3650314
13      4   1 1.105171           NA
14      4   2 1.349859    0.2446879
15      4   3 1.648721    0.2988625
16      4   4 2.013753    0.3650314
17      5   1 1.105171           NA
18      5   2 1.349859    0.2446879
19      5   3 1.648721    0.2988625
20      5   4 2.013753    0.3650314

(On my machine it was over six times as fast as the if-based code from Arun. )
David Winsemius
Alameda, CA, USA
Sorry, it looks like I was hasty.
Absent another dumb mistake, the following should do it.

The request was for differences, i.e., the amount of growth from one
period to the next, separately for each tree.

for (ir in unique(df$TreeID)) {
  in.ir <- df$TreeID == ir
  df$HeightGrowth[in.ir] <- c(NA, diff(df$Height[in.ir]))
}

And this gives the same result as Rui Barradas' previous response.

-Don
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 6/10/13 2:51 PM, "MacQueen, Don" <macqueen1 at llnl.gov> wrote:

>How about
>
>for (ir in unique(df$TreeID)) {
>  in.ir <- df$TreeID == ir
>  df$HeightGrowth[in.ir] <- cumsum(df$Height[in.ir])
>}
>
>Seemed fast enough to me.
>
>In R, it is generally good to look for ways to operate on entire vectors
>or arrays, rather than element by element within them. The cumsum()
>function does that in this example.
>
>-Don
>
>
>-- 
>Don MacQueen
>
>Lawrence Livermore National Laboratory
>7000 East Ave., L-627
>Livermore, CA 94550
>925-423-1062
>
>
>
>
>
>On 6/10/13 10:28 AM, "Trevor Walker" <trevordaviswalker at gmail.com> wrote:
>
>>I have a For loop that is quite slow and am wondering if there is a
>>faster
>>option:
>>
>>df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500))
>>df$Height <- exp(-0.1 + 0.2*df$Age)
>>df$HeightGrowth <- NA   #intialize with NA
>>for (i in 2:nrow(df))
>> {if(df$TreeID[i]==df$TreeID[i-1])
>>  {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1]
>>  }
>> }
>>
>>Trevor Walker
>>Email: trevordaviswalker at gmail.com
>>
>>	[[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Well, speaking of hasty...

This will also do it, provided that each tree's initial height is less
than the previous tree's final height. In principle, not a safe
assumption, but might be ok depending on where the data came from.

df$delta <- c(NA,diff(df$Height))
df$delta[df$delta < 0] <- NA

-Don
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 6/10/13 2:51 PM, "MacQueen, Don" <macqueen1 at llnl.gov> wrote:

>How about
>
>for (ir in unique(df$TreeID)) {
>  in.ir <- df$TreeID == ir
>  df$HeightGrowth[in.ir] <- cumsum(df$Height[in.ir])
>}
>
>Seemed fast enough to me.
>
>In R, it is generally good to look for ways to operate on entire vectors
>or arrays, rather than element by element within them. The cumsum()
>function does that in this example.
>
>-Don
>
>
>-- 
>Don MacQueen
>
>Lawrence Livermore National Laboratory
>7000 East Ave., L-627
>Livermore, CA 94550
>925-423-1062
>
>
>
>
>
>On 6/10/13 10:28 AM, "Trevor Walker" <trevordaviswalker at gmail.com> wrote:
>
>>I have a For loop that is quite slow and am wondering if there is a
>>faster
>>option:
>>
>>df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500))
>>df$Height <- exp(-0.1 + 0.2*df$Age)
>>df$HeightGrowth <- NA   #intialize with NA
>>for (i in 2:nrow(df))
>> {if(df$TreeID[i]==df$TreeID[i-1])
>>  {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1]
>>  }
>> }
>>
>>Trevor Walker
>>Email: trevordaviswalker at gmail.com
>>
>>	[[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Hi,
Some speed comparisons:

df <- data.frame(TreeID=rep(1:6000,each=20), Age=rep(seq(1,20,1),6000))
df$Height <- exp(-0.1 + 0.2*df$Age)
df1<- df
df3<-df
library(data.table)
dt1<- data.table(df)
df$HeightGrowth <- NA 

system.time({? #Rui's 2nd function
df2 <- data.matrix(df)
for(i in seq_len(nrow(df2))[-1]){
??? if(df2[i, "TreeID"] == df2[i - 1, "TreeID"])
??? ??? df2[i, "HeightGrowth"] <- df2[i, "Height"] - df2[i - 1, "Height"]
}
})
# user? system elapsed 
?# 1.108?? 0.000?? 1.109 

system.time({for (ir in unique(df$TreeID)) {?? #Don's first function
? in.ir <- df$TreeID == ir
? df$HeightGrowth[in.ir] <- c(NA, diff(df$Height[in.ir]))
}})
#? user? system elapsed 
#100.004?? 0.704 100.903 

system.time({df3$delta <- c(NA,diff(df3$Height)) ##Don's 2nd function
df3$delta[df3$delta < 0] <- NA}) #####winner 
#?? user? system elapsed 
?# 0.016?? 0.000?? 0.014 

system.time(df1$HeightGrowth <- ave(df1$Height, df1$TreeID, FUN= function(vec) c(NA, diff(vec)))) #David's
?#user? system elapsed 
?# 0.136?? 0.000?? 0.137 
?system.time(dt1[,HeightGrowth:=c(NA,diff(Height)),by=TreeID])
#? user? system elapsed 
?# 0.076?? 0.000?? 0.079 

?identical(df1,as.data.frame(dt1))
#[1] TRUE
?identical(df1,df)
#[1] TRUE

head(df1,2)
#? TreeID Age?? Height HeightGrowth
#1????? 1?? 1 1.105171?????????? NA
#2????? 1?? 2 1.349859??? 0.2446879
head(df2,2)
#???? TreeID Age?? Height HeightGrowth
#[1,]????? 1?? 1 1.105171?????????? NA
#[2,]????? 1?? 2 1.349859??? 0.2446879

A.K.

----- Original Message -----
From: Trevor Walker <trevordaviswalker at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Monday, June 10, 2013 1:28 PM
Subject: [R] Speed up or alternative to 'For' loop

I have a For loop that is quite slow and am wondering if there is a faster
option:

df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500))
df$Height <- exp(-0.1 + 0.2*df$Age)
df$HeightGrowth <- NA?  #intialize with NA
for (i in 2:nrow(df))
{if(df$TreeID[i]==df$TreeID[i-1])
? {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1]
? }
}

Trevor Walker
Email: trevordaviswalker at gmail.com

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.