Message-ID: <BANLkTimceoeyjRK35=bZJrhH+pG4VdZ2AA@mail.gmail.com>
Date: 2011-06-03T15:28:08Z
From: Gabor Grothendieck
Subject: Counting occurrences in a moving window
In-Reply-To: <1307029289752-3568658.post@n4.nabble.com>
On Thu, Jun 2, 2011 at 11:41 AM, mdvaan <mathijsdevaan at gmail.com> wrote:
> Hi list, based on the following data.frame I would like to create a variable
> that indicates the number of occurrences of A in the 3 years prior to the
> current year:
>
> DF = data.frame(read.table(textConnection(" ?A ?B
> 8025 ?1995
> 8026 ?1995
> 8029 ?1995
> 8026 ?1996
> 8025 ?1997
> 8026 ?1997
> 8025 ?1997
> 8027 ?1997
> 8026 ?1999
> 8027 ?1999
> 8028 ?1995
> 8029 ?1998
> 8025 ?1997
> 8027 ?1997
> 8026 ?1999
> 8027 ?1999
> 8028 ?1995
> 8029 ?1998"),head=TRUE,stringsAsFactors=FALSE))
>
> becomes:
>
> A ? ? ? ? ? ?B ? ? ?C
> 8025 ?1995 ?0
> 8026 ?1995 ?0
> 8029 ?1995 ?0
> 8026 ?1996 ?1
> 8025 ?1997 ?1
> 8026 ?1997 ?2
> 8025 ?1997 ?1
> 8027 ?1997 ?0
> 8026 ?1999 ?2
> 8027 ?1999 ?2
> 8028 ?1995 ?0
> 8029 ?1998 ?1
> 8025 ?1997 ?1
> 8027 ?1997 ?0
> 8026 ?1999 ?2
> 8027 ?1999 ?2
> 8028 ?1995 ?0
> 8029 ?2000 ?1
>
> So 8026 in 1997 = 2 because 8026 can be found in 1995 and 1996 which are
> both within the appropriate window (1996 - 1994).
>
> Any ideas? I looked at the rollapply vignette, but couldn't figure out how
> to apply it to my data.
>
Try this:
> DF$C <- sapply(1:nrow(DF), function(i)
+ sum(DF$B < DF$B[i] & DF$B >= DF$B[i]-3 & DF$A[i] == DF$A))
> DF
A B C
1 8025 1995 0
2 8026 1995 0
3 8029 1995 0
4 8026 1996 1
5 8025 1997 1
6 8026 1997 2
7 8025 1997 1
8 8027 1997 0
9 8026 1999 2
10 8027 1999 2
11 8028 1995 0
12 8029 1998 1
13 8025 1997 1
14 8027 1997 0
15 8026 1999 2
16 8027 1999 2
17 8028 1995 0
18 8029 1998 1
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com