zoo:rollapply by multiple grouping factors - R-help

Sun, Apr 3, 2011 8:58 AM #

# Hi there,
# I am trying to apply a function over a moving-window for a large 
number of multivariate time-series that are grouped in a nested set of 
factors.  I have spent a few days searching for solutions with no luck, 
so any suggestions are much appreciated.

# The data I have are for the abundance dynamics of multiple species 
observed in multiple fixed plots at multiple sites.  (I total I have 7 
sites, ~3-5 plots/site, ~150 species/plot, for 60 time-steps each.) So 
my data look something like this:

dat<-data.frame(Site=rep(1), Plot=rep(c(rep(1,8),rep(2,8),rep(3,8)),1), 
Time=rep(c(1,1,2,2,3,3,4,4)), Sp=rep(1:2), Count=sample(24))
dat

# Let the function I want to apply over a right-aligned window of w=2 
time steps be:
cv<-function(x){sd(x)/mean(x)}
w<-2

# The final output I want would look something like this:
Out<-data.frame(dat,CV=round(c(NA,NA,runif(6,0,1),c(NA,NA,runif(6,0,1))),2))

# I could reshape and apply zoo:rollapply() to a given plot at a given 
site, and reshape again as follows:
library(zoo)
a<-subset(dat,Site==1&Plot==1)
b<-reshape(a[-c(1,2)],v.names='Count',idvar='Time',timevar='Sp',direction='wide')
d<-zoo(b[,-1],b[,1])
d
out<-rollapply(d, w, cv, na.pad=T, align='right')
out

# I would thereby have to loop through all my sites and plots which, 
although it deals with all species at once, still seems exceedingly 
inefficient.

# So the question is, how do I use something like aggregate.zoo or 
tapply or even lapply to apply rollapply on each species' time series.

# The closest I've come is the following two approaches:

# First let:
datx<-list(Site=dat$Site,Plot=dat$Plot,Sp=dat$Sp)
daty<-dat$Count

# Method 1.
out1<-tapply(seq(along=daty),datx,function(i,x=daty){ 
rollapply(zoo(x[i]), w, cv, na.pad=T, align='right') })
out1
out1[,,1]

# Which "works" in that it gives me the right answers, but in a format 
from which I can't figure out how to get back into the format I want.

# Method 2.
fun<-function(x){y<-zoo(x);coredata(rollapply(y, w, 
cv,na.pad=T,align='right'))}
out2<-aggregate(daty,by=datx,fun)
out2

# Which superficially "works" better, but again only in a format I can't 
figure out how to use because the output seems to be a mix of data.frame 
and lists.
out2[1,4]
out2[1,5]
is.data.frame(out2)
is.list(out2)

# The situation is made more problematic by the fact that the time point 
of first survey can differ between plots  (e.g., site1-plot3 may only 
start at time-point 3).  As in...
dat2<-dat
dat2<-dat2[-which(dat2$Plot==3 & dat2$Time<3),]
dat2

# I must therefore ensure that I'm keeping track of the true time 
associated with each value, not just the order of their occurences.  
This information is (seemingly) lost by both methods.
datx<-list(Site=dat2$Site,Plot=dat2$Plot,Sp=dat2$Sp)
daty<-dat2$Count

# Method 1.
out3<-tapply(seq(along=daty),datx,function(i,x=daty){ 
rollapply(zoo(x[i]), w, cv, na.pad=T, align='right') })
out3
out3[1,3,1]
time(out3[1,3,1])

# Method 2
out4<-aggregate(daty,by=datx,fun)
out4
time(out4[3,4])


# Am I going about this all wrong?  Is there a different package to 
try?  Any thoughts and suggestions are much appreciated!

# R 2.12.2 GUI 1.36 Leopard build 32-bit (5691); zoo 1.6-4

# Thanks!
# -mark

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
Ecology & Evolutionary Biology
University of California, Santa Cruz
Long Marine Laboratory
100 Shaffer Road
Santa Cruz, CA 95060-5730
Ph: 773-256-8645
Fax: 831-459-3383
http://people.ucsc.edu/~mnovak1/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--

Gabor Grothendieck

Sun, Apr 3, 2011 3:27 PM #

On Sun, Apr 3, 2011 at 11:58 AM, Mark Novak <mnovak1 at ucsc.edu> wrote:

# Hi there,
# I am trying to apply a function over a moving-window for a large number of
multivariate time-series that are grouped in a nested set of factors. ?I
have spent a few days searching for solutions with no luck, so any
suggestions are much appreciated.

# The data I have are for the abundance dynamics of multiple species
observed in multiple fixed plots at multiple sites. ?(I total I have 7
sites, ~3-5 plots/site, ~150 species/plot, for 60 time-steps each.) So my
data look something like this:

dat<-data.frame(Site=rep(1), Plot=rep(c(rep(1,8),rep(2,8),rep(3,8)),1),
Time=rep(c(1,1,2,2,3,3,4,4)), Sp=rep(1:2), Count=sample(24))
dat

# Let the function I want to apply over a right-aligned window of w=2 time
steps be:
cv<-function(x){sd(x)/mean(x)}
w<-2

# The final output I want would look something like this:
Out<-data.frame(dat,CV=round(c(NA,NA,runif(6,0,1),c(NA,NA,runif(6,0,1))),2))

# I could reshape and apply zoo:rollapply() to a given plot at a given site,
and reshape again as follows:
library(zoo)
a<-subset(dat,Site==1&Plot==1)
b<-reshape(a[-c(1,2)],v.names='Count',idvar='Time',timevar='Sp',direction='wide')
d<-zoo(b[,-1],b[,1])
d
out<-rollapply(d, w, cv, na.pad=T, align='right')
out

# I would thereby have to loop through all my sites and plots which,
although it deals with all species at once, still seems exceedingly
inefficient.

# So the question is, how do I use something like aggregate.zoo or tapply or
even lapply to apply rollapply on each species' time series.

# The closest I've come is the following two approaches:

# First let:
datx<-list(Site=dat$Site,Plot=dat$Plot,Sp=dat$Sp)
daty<-dat$Count

# Method 1.
out1<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]),
w, cv, na.pad=T, align='right') })
out1
out1[,,1]

# Which "works" in that it gives me the right answers, but in a format from
which I can't figure out how to get back into the format I want.

# Method 2.
fun<-function(x){y<-zoo(x);coredata(rollapply(y, w,
cv,na.pad=T,align='right'))}
out2<-aggregate(daty,by=datx,fun)
out2

# Which superficially "works" better, but again only in a format I can't
figure out how to use because the output seems to be a mix of data.frame and
lists.
out2[1,4]
out2[1,5]
is.data.frame(out2)
is.list(out2)

# The situation is made more problematic by the fact that the time point of
first survey can differ between plots ?(e.g., site1-plot3 may only start at
time-point 3). ?As in...
dat2<-dat
dat2<-dat2[-which(dat2$Plot==3 & dat2$Time<3),]
dat2

# I must therefore ensure that I'm keeping track of the true time associated
with each value, not just the order of their occurences. ?This information
is (seemingly) lost by both methods.
datx<-list(Site=dat2$Site,Plot=dat2$Plot,Sp=dat2$Sp)
daty<-dat2$Count

# Method 1.
out3<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]),
w, cv, na.pad=T, align='right') })
out3
out3[1,3,1]
time(out3[1,3,1])

# Method 2
out4<-aggregate(daty,by=datx,fun)
out4
time(out4[3,4])


# Am I going about this all wrong? ?Is there a different package to try?
?Any thoughts and suggestions are much appreciated!

# R 2.12.2 GUI 1.36 Leopard build 32-bit (5691); zoo 1.6-4

# Thanks!
# -mark

Try ave:

dat$cv <- ave(dat$Count, dat[c("Site", "Plot", "Sp")], FUN =
function(x) rollapply(zoo(x), 2, cv, na.pad = TRUE, align = "right"))

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Mark Novak

Mon, Apr 4, 2011 12:40 PM #

Thank you very much Gabor!  It looks like that's gonna work 
wonderfully.  I didn't even know 'ave' existed.

For others out there:  I only needed to add a comma:   dat[,c("Site", 
"Plot", "Sp")]

Small follow up Q:  Is there any reason to use 'aggregate' vs. 'ave' in 
general?

-mark

On 4/3/11 3:27 PM, Gabor Grothendieck wrote:

Try ave:

dat$cv<- ave(dat$Count, dat[c("Site", "Plot", "Sp")], FUN =
function(x) rollapply(zoo(x), 2, cv, na.pad = TRUE, align = "right"))

On Sun, Apr 3, 2011 at 11:58 AM, Mark Novak<mnovak1 at ucsc.edu>  wrote:

# Hi there,
# I am trying to apply a function over a moving-window for a large number of
multivariate time-series that are grouped in a nested set of factors.  I
have spent a few days searching for solutions with no luck, so any
suggestions are much appreciated.

# The data I have are for the abundance dynamics of multiple species
observed in multiple fixed plots at multiple sites.  (I total I have 7
sites, ~3-5 plots/site, ~150 species/plot, for 60 time-steps each.) So my
data look something like this:

dat<-data.frame(Site=rep(1), Plot=rep(c(rep(1,8),rep(2,8),rep(3,8)),1),
Time=rep(c(1,1,2,2,3,3,4,4)), Sp=rep(1:2), Count=sample(24))
dat

# Let the function I want to apply over a right-aligned window of w=2 time
steps be:
cv<-function(x){sd(x)/mean(x)}
w<-2

# The final output I want would look something like this:
Out<-data.frame(dat,CV=round(c(NA,NA,runif(6,0,1),c(NA,NA,runif(6,0,1))),2))

# I could reshape and apply zoo:rollapply() to a given plot at a given site,
and reshape again as follows:
library(zoo)
a<-subset(dat,Site==1&Plot==1)
b<-reshape(a[-c(1,2)],v.names='Count',idvar='Time',timevar='Sp',direction='wide')
d<-zoo(b[,-1],b[,1])
d
out<-rollapply(d, w, cv, na.pad=T, align='right')
out

# I would thereby have to loop through all my sites and plots which,
although it deals with all species at once, still seems exceedingly
inefficient.

# So the question is, how do I use something like aggregate.zoo or tapply or
even lapply to apply rollapply on each species' time series.

# The closest I've come is the following two approaches:

# First let:
datx<-list(Site=dat$Site,Plot=dat$Plot,Sp=dat$Sp)
daty<-dat$Count

# Method 1.
out1<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]),
w, cv, na.pad=T, align='right') })
out1
out1[,,1]

# Which "works" in that it gives me the right answers, but in a format from
which I can't figure out how to get back into the format I want.

# Method 2.
fun<-function(x){y<-zoo(x);coredata(rollapply(y, w,
cv,na.pad=T,align='right'))}
out2<-aggregate(daty,by=datx,fun)
out2

# Which superficially "works" better, but again only in a format I can't
figure out how to use because the output seems to be a mix of data.frame and
lists.
out2[1,4]
out2[1,5]
is.data.frame(out2)
is.list(out2)

# The situation is made more problematic by the fact that the time point of
first survey can differ between plots  (e.g., site1-plot3 may only start at
time-point 3).  As in...
dat2<-dat
dat2<-dat2[-which(dat2$Plot==3&  dat2$Time<3),]
dat2

# I must therefore ensure that I'm keeping track of the true time associated
with each value, not just the order of their occurences.  This information
is (seemingly) lost by both methods.
datx<-list(Site=dat2$Site,Plot=dat2$Plot,Sp=dat2$Sp)
daty<-dat2$Count

# Method 1.
out3<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]),
w, cv, na.pad=T, align='right') })
out3
out3[1,3,1]
time(out3[1,3,1])

# Method 2
out4<-aggregate(daty,by=datx,fun)
out4
time(out4[3,4])


# Am I going about this all wrong?  Is there a different package to try?
  Any thoughts and suggestions are much appreciated!

# R 2.12.2 GUI 1.36 Leopard build 32-bit (5691); zoo 1.6-4

# Thanks!
# -mark

Gabor Grothendieck

Mon, Apr 4, 2011 12:54 PM #

On Mon, Apr 4, 2011 at 3:40 PM, Mark Novak <mnovak1 at ucsc.edu> wrote:

Actually, if dd is a data frame dd[, ix] and dd[ix] give the same result. e.g.

[1] TRUE

aggregate reduces the data to fewer rows. ave adds a potentially
additional column to the original data.

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com