Skip to content

Counting occurrences in a moving window

5 messages · mdvaan, Gabor Grothendieck

#
Hi list, based on the following data.frame I would like to create a variable
that indicates the number of occurrences of A in the 3 years prior to the
current year:

DF = data.frame(read.table(textConnection("  A  B
8025  1995
8026  1995
8029  1995
8026  1996
8025  1997
8026  1997
8025  1997
8027  1997
8026  1999
8027  1999
8028  1995
8029  1998
8025  1997
8027  1997
8026  1999
8027  1999
8028  1995
8029  1998"),head=TRUE,stringsAsFactors=FALSE))

becomes:

A            B      C
8025  1995  0  
8026  1995  0
8029  1995  0
8026  1996  1
8025  1997  1
8026  1997  2
8025  1997  1
8027  1997  0
8026  1999  2
8027  1999  2
8028  1995  0
8029  1998  1
8025  1997  1
8027  1997  0
8026  1999  2
8027  1999  2 
8028  1995  0
8029  2000  1

So 8026 in 1997 = 2 because 8026 can be found in 1995 and 1996 which are
both within the appropriate window (1996 - 1994).

Any ideas? I looked at the rollapply vignette, but couldn't figure out how
to apply it to my data.

Thanks a lot!




--
View this message in context: http://r.789695.n4.nabble.com/Counting-occurrences-in-a-moving-window-tp3568658p3568658.html
Sent from the R help mailing list archive at Nabble.com.
#
Would it be possible to use the sqldf package and the ave function to simply
run ave over a limited set? So something like:

DF = data.frame(read.table(textConnection("  A  B 
8025  1995 
8026  1995 
8029  1995 
8026  1996 
8025  1997 
8026  1997 
8025  1997 
8027  1997 
8026  1999 
8027  1999 
8028  1995 
8029  1998 
8025  1997 
8027  1997 
8026  1999 
8027  1999 
8028  1995 
8029  1998"),head=TRUE,stringsAsFactors=FALSE)) 

library(sqldf)
years<-c(1995:1999)
for (t in 1:length(years))
	{
	year = as.numeric(years[t])
	m<-sqldf('select * from DF where B between $year-1 AND $year-4')
	n<-ave(m$A,m$A,FUN = length)
	}

How do I get the correct values in DF$C? Thanks!!


--
View this message in context: http://r.789695.n4.nabble.com/Counting-occurrences-in-a-moving-window-tp3568658p3570652.html
Sent from the R help mailing list archive at Nabble.com.
#
On Thu, Jun 2, 2011 at 11:41 AM, mdvaan <mathijsdevaan at gmail.com> wrote:
Try this:
+ sum(DF$B < DF$B[i] & DF$B >= DF$B[i]-3 & DF$A[i] == DF$A))
A    B C
1  8025 1995 0
2  8026 1995 0
3  8029 1995 0
4  8026 1996 1
5  8025 1997 1
6  8026 1997 2
7  8025 1997 1
8  8027 1997 0
9  8026 1999 2
10 8027 1999 2
11 8028 1995 0
12 8029 1998 1
13 8025 1997 1
14 8027 1997 0
15 8026 1999 2
16 8027 1999 2
17 8028 1995 0
18 8029 1998 1
#
On Fri, Jun 3, 2011 at 8:11 AM, mdvaan <mathijsdevaan at gmail.com> wrote:
In sqldf it would be like this:

sqldf("select x.*, sum(x.A = y.A and y.B < x.B and y.B >= x.B-3) C
from DF x, DF y group by x.rowid")
#
Thank you very much! I really liked the first solution, it worked great for
my larger dataset.

M
Gabor Grothendieck wrote:
--
View this message in context: http://r.789695.n4.nabble.com/Counting-occurrences-in-a-moving-window-tp3568658p3571916.html
Sent from the R help mailing list archive at Nabble.com.