Using cumsum with 'group by' ?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121122/faebd6cb/attachment.pl>
Hi,
No problem.
One more method if you wanted to try:
library(data.table)
dat2<-data.table(dat1)
dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
?# ? id? x? time Cumsum
?#1:? 1? 5 12:01????? 5
?#2:? 1 14 12:02???? 19
?#3:? 1? 6 12:03???? 25
?#4:? 1? 3 12:04???? 28
?#5:? 2 98 12:01???? 98
?#6:? 2 23 12:02??? 121
?#7:? 2? 1 12:03??? 122
?#8:? 2? 4 12:04??? 126
?#9:? 3? 5 12:01????? 5
#10:? 3 65 12:02???? 70
#11:? 3 23 12:03???? 93
#12:? 3 23 12:04??? 116

A.K.

----- Original Message -----
From: TheRealJimShady <james.david.smith at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Thursday, November 22, 2012 12:27 PM
Subject: Re: [R] Using cumsum with 'group by' ?

Thank you very much, I will try these tomorrow morning.

On 22 November 2012 17:25, arun kirshna [via R]
HI,
You can do this in many ways:
dat1<-read.table(text="
id? ? time? ? x
1?  12:01? ? 5
1?  12:02?  14
1?  12:03?  6
1?  12:04?  3
2?  12:01?  98
2?  12:02?  23
2?  12:03?  1
2?  12:04?  4
3?  12:01?  5
3?  12:02?  65
3?  12:03?  23
3?  12:04?  23
",sep="",header=TRUE,stringsAsFactors=FALSE)
? dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
#or
? unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
# [1]?  5? 19? 25? 28? 98 121 122 126?  5? 70? 93 116
#or
library(plyr)
? ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
# [1]?  5? 19? 25? 28? 98 121 122 126?  5? 70? 93 116
head(dat1)
#? id? time? x Cumsum
#1? 1 12:01? 5? ? ? 5
#2? 1 12:02 14? ?  19
#3? 1 12:03? 6? ?  25
#4? 1 12:04? 3? ?  28
#5? 2 12:01 98? ?  98
#6? 2 12:02 23? ? 121
A.K.

________________________________
If you reply to this email, your message will be added to the discussion
below:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML
--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
Sent from the R help mailing list archive at Nabble.com.
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121123/fea71491/attachment.pl>

Hi Arun & everyone,

Thank you very much for your helpful suggestions. I've been working
through them, but have realised that my data is a little more
complicated than I said and that the solutions you've kindly provided
don't work. The problem is that there is more than one day of data for
each person. It looks like this:

id          x          date
1          5          2012-06-05 12:01
1          10        2012-06-05 12:02
1          45        2012-06-05 12:03
2          5          2012-06-05 12:01
2          3          2012-06-05 12:03
2          2          2012-06-05 12:05
3          5          2012-06-05 12:03
3          5          2012-06-05 12:04
3          8          2012-06-05 12:05
1          5          2012-06-08 13:01
1          9          2012-06-08 13:02
1          3          2012-06-08 13:03
2          0          2012-06-08 13:15
2          1          2012-06-08 13:18
2          8          2012-06-08 13:20
2          4          2012-06-08 13:21
3          6          2012-06-08 13:15
3          2          2012-06-08 13:16
3          7          2012-06-08 13:17
3          2          2012-06-08 13:18

So what I need to do is something like this (in pseudo code anyway):

- Order the data by the id field and then the date field
- add a new variable called cumsum
- calculate this variable as the cumulative value of X, but grouping
by the id and date (not date, not date and time).
Did you miss an 'e' on the first 'not'?? Otherwise, I'm confused.

Why do people always forget about ave()? I'd try

newdata <- transform(mydata, csum=ave(x, id, as.Date(date), FUN=cumsum))

You still need to sort, of course, at least by date (incl. time). Check carefully whether time zone is an issue for as.Date --- you may need the tz argument.

Also notice that I shy using an R function name as a variable name. This is mostly superstition these days, but better safe than sorry.
Thank you

James

On 23 November 2012 03:54, arun kirshna [via R]
<ml-node+s789695n4650505h81 at n4.nabble.com> wrote:
Hi,
No problem.
One more method if you wanted to try:
library(data.table)
dat2<-data.table(dat1)
dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
#   id  x  time Cumsum
#1:  1  5 12:01      5
#2:  1 14 12:02     19
#3:  1  6 12:03     25
#4:  1  3 12:04     28
#5:  2 98 12:01     98
#6:  2 23 12:02    121
#7:  2  1 12:03    122
#8:  2  4 12:04    126
#9:  3  5 12:01      5
#10:  3 65 12:02     70
#11:  3 23 12:03     93
#12:  3 23 12:04    116

A.K.

----- Original Message -----
From: TheRealJimShady <[hidden email]>
To: [hidden email]
Cc:
Sent: Thursday, November 22, 2012 12:27 PM
Subject: Re: [R] Using cumsum with 'group by' ?

Thank you very much, I will try these tomorrow morning.

On 22 November 2012 17:25, arun kirshna [via R]
<[hidden email]> wrote:

HI,
You can do this in many ways:
dat1<-read.table(text="
id    time    x
1   12:01    5
1   12:02   14
1   12:03   6
1   12:04   3
2   12:01   98
2   12:02   23
2   12:03   1
2   12:04   4
3   12:01   5
3   12:02   65
3   12:03   23
3   12:04   23
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
#or
unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
# [1]   5  19  25  28  98 121 122 126   5  70  93 116
#or
library(plyr)
ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
# [1]   5  19  25  28  98 121 122 126   5  70  93 116
head(dat1)
#  id  time  x Cumsum
#1  1 12:01  5      5
#2  1 12:02 14     19
#3  1 12:03  6     25
#4  1 12:04  3     28
#5  2 12:01 98     98
#6  2 12:02 23    121
A.K.

________________________________
If you reply to this email, your message will be added to the discussion
below:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
Sent from the R help mailing list archive at Nabble.com.
   [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________
If you reply to this email, your message will be added to the discussion
below:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html
Sent from the R help mailing list archive at Nabble.com.
	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121123/37ea5c9a/attachment.pl>
HI,

If that is the case, this should work:
dat1<-read.table(text="
id,????????? x,????????? date
1,????????? 5,????????? 2012-06-05 12:01
1,????????? 10,??????? 2012-06-05 12:02
1,????????? 45,??????? 2012-06-05 12:03
2,????????? 5,????????? 2012-06-05 12:01
2,????????? 3,????????? 2012-06-05 12:03
2,????????? 2,????????? 2012-06-05 12:05
3,????????? 5,????????? 2012-06-05 12:03
3,????????? 5,????????? 2012-06-05 12:04
3,????????? 8,????????? 2012-06-05 12:05
1,????????? 5,????????? 2012-06-08 13:01
1,????????? 9,????????? 2012-06-08 13:02
1,????????? 3,????????? 2012-06-08 13:03
2,????????? 0,????????? 2012-06-08 13:15
2,????????? 1,????????? 2012-06-08 13:18
2,????????? 8,????????? 2012-06-08 13:20
2,????????? 4,????????? 2012-06-08 13:21
3,????????? 6,????????? 2012-06-08 13:15
3,????????? 2,????????? 2012-06-08 13:16
3,????????? 7,????????? 2012-06-08 13:17
3,????????? 2,????????? 2012-06-08 13:18
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat1$date<-as.Date(dat1$date,format="%Y-%m-%d %H:%M")
?dat2<-dat1[order(dat1[,1],dat1[,3]),]
?dat2$Cumsum<-ave(dat2$x,list(dat2$id,dat2$date),FUN=cumsum)

head(dat2)
#?? id? x?????? date Cumsum
#1?? 1? 5 2012-06-05????? 5
#2?? 1 10 2012-06-05???? 15
#3?? 1 45 2012-06-05???? 60
#10? 1? 5 2012-06-08????? 5
#11? 1? 9 2012-06-08???? 14
#12? 1? 3 2012-06-08???? 17
#or
with(dat2,aggregate(x,by=list(id=id,date=date),cumsum))
#? id?????? date??????????? x
#1? 1 2012-06-05??? 5, 15, 60
#2? 2 2012-06-05???? 5, 8, 10
#3? 3 2012-06-05??? 5, 10, 18
#4? 1 2012-06-08??? 5, 14, 17
#5? 2 2012-06-08? 0, 1, 9, 13
#6? 3 2012-06-08 6, 8, 15, 17
A.K.

----- Original Message -----
From: TheRealJimShady <james.david.smith at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, November 23, 2012 6:04 AM
Subject: Re: [R] Using cumsum with 'group by' ?

Hi Arun & everyone,

Thank you very much for your helpful suggestions. I've been working
through them, but have realised that my data is a little more
complicated than I said and that the solutions you've kindly provided
don't work. The problem is that there is more than one day of data for
each person. It looks like this:

id? ? ? ? ? x? ? ? ? ? date
1? ? ? ? ? 5? ? ? ? ? 2012-06-05 12:01
1? ? ? ? ? 10? ? ? ? 2012-06-05 12:02
1? ? ? ? ? 45? ? ? ? 2012-06-05 12:03
2? ? ? ? ? 5? ? ? ? ? 2012-06-05 12:01
2? ? ? ? ? 3? ? ? ? ? 2012-06-05 12:03
2? ? ? ? ? 2? ? ? ? ? 2012-06-05 12:05
3? ? ? ? ? 5? ? ? ? ? 2012-06-05 12:03
3? ? ? ? ? 5? ? ? ? ? 2012-06-05 12:04
3? ? ? ? ? 8? ? ? ? ? 2012-06-05 12:05
1? ? ? ? ? 5? ? ? ? ? 2012-06-08 13:01
1? ? ? ? ? 9? ? ? ? ? 2012-06-08 13:02
1? ? ? ? ? 3? ? ? ? ? 2012-06-08 13:03
2? ? ? ? ? 0? ? ? ? ? 2012-06-08 13:15
2? ? ? ? ? 1? ? ? ? ? 2012-06-08 13:18
2? ? ? ? ? 8? ? ? ? ? 2012-06-08 13:20
2? ? ? ? ? 4? ? ? ? ? 2012-06-08 13:21
3? ? ? ? ? 6? ? ? ? ? 2012-06-08 13:15
3? ? ? ? ? 2? ? ? ? ? 2012-06-08 13:16
3? ? ? ? ? 7? ? ? ? ? 2012-06-08 13:17
3? ? ? ? ? 2? ? ? ? ? 2012-06-08 13:18

So what I need to do is something like this (in pseudo code anyway):

- Order the data by the id field and then the date field
- add a new variable called cumsum
- calculate this variable as the cumulative value of X, but grouping
by the id and date (not date, not date and time).

Thank you

James

On 23 November 2012 03:54, arun kirshna [via R]
Hi,
No problem.
One more method if you wanted to try:
library(data.table)
dat2<-data.table(dat1)
dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
? #?  id? x? time Cumsum
? #1:? 1? 5 12:01? ? ? 5
? #2:? 1 14 12:02? ?  19
? #3:? 1? 6 12:03? ?  25
? #4:? 1? 3 12:04? ?  28
? #5:? 2 98 12:01? ?  98
? #6:? 2 23 12:02? ? 121
? #7:? 2? 1 12:03? ? 122
? #8:? 2? 4 12:04? ? 126
? #9:? 3? 5 12:01? ? ? 5
#10:? 3 65 12:02? ?  70
#11:? 3 23 12:03? ?  93
#12:? 3 23 12:04? ? 116

A.K.

----- Original Message -----
From: TheRealJimShady <[hidden email]>
To: [hidden email]
Cc:
Sent: Thursday, November 22, 2012 12:27 PM
Subject: Re: [R] Using cumsum with 'group by' ?

Thank you very much, I will try these tomorrow morning.

On 22 November 2012 17:25, arun kirshna [via R]
<[hidden email]> wrote:

HI,
You can do this in many ways:
dat1<-read.table(text="
id? ? time? ? x
1?  12:01? ? 5
1?  12:02?  14
1?  12:03?  6
1?  12:04?  3
2?  12:01?  98
2?  12:02?  23
2?  12:03?  1
2?  12:04?  4
3?  12:01?  5
3?  12:02?  65
3?  12:03?  23
3?  12:04?  23
",sep="",header=TRUE,stringsAsFactors=FALSE)
? dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
#or
? unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
# [1]?  5? 19? 25? 28? 98 121 122 126?  5? 70? 93 116
#or
library(plyr)
? ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
# [1]?  5? 19? 25? 28? 98 121 122 126?  5? 70? 93 116
head(dat1)
#? id? time? x Cumsum
#1? 1 12:01? 5? ? ? 5
#2? 1 12:02 14? ?  19
#3? 1 12:03? 6? ?  25
#4? 1 12:04? 3? ?  28
#5? 2 12:01 98? ?  98
#6? 2 12:02 23? ? 121
A.K.

________________________________
If you reply to this email, your message will be added to the discussion
below:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
Sent from the R help mailing list archive at Nabble.com.
? ?  [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________
If you reply to this email, your message will be added to the discussion
below:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML
--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html
Sent from the R help mailing list archive at Nabble.com.
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
TheRealJimShady wrote
Hi Peter,

Yes, I did miss an e from the first 'not' in the brackets at the end
of the message, sorry.

Thanks for that code, but when I use it, it creates a new column
called csum which simply contains the values of the variable x . i.e.
it just duplicates the values from x into the new column.  I guess
this means that the grouping isn't occurring correctly?

James

On 23 November 2012 13:57, Peter Dalgaard-2 [via R]
&lt;
ml-node+s789695n4650550h76 at .nabble
&gt; wrote:
On Nov 23, 2012, at 12:04 , TheRealJimShady wrote:

Hi Arun & everyone,

Thank you very much for your helpful suggestions. I've been working
through them, but have realised that my data is a little more
complicated than I said and that the solutions you've kindly provided
don't work. The problem is that there is more than one day of data for
each person. It looks like this:

id          x          date
1          5          2012-06-05 12:01
1          10        2012-06-05 12:02
1          45        2012-06-05 12:03
2          5          2012-06-05 12:01
2          3          2012-06-05 12:03
2          2          2012-06-05 12:05
3          5          2012-06-05 12:03
3          5          2012-06-05 12:04
3          8          2012-06-05 12:05
1          5          2012-06-08 13:01
1          9          2012-06-08 13:02
1          3          2012-06-08 13:03
2          0          2012-06-08 13:15
2          1          2012-06-08 13:18
2          8          2012-06-08 13:20
2          4          2012-06-08 13:21
3          6          2012-06-08 13:15
3          2          2012-06-08 13:16
3          7          2012-06-08 13:17
3          2          2012-06-08 13:18

So what I need to do is something like this (in pseudo code anyway):

- Order the data by the id field and then the date field
- add a new variable called cumsum
- calculate this variable as the cumulative value of X, but grouping
by the id and date (not date, not date and time).
Did you miss an 'e' on the first 'not'?? Otherwise, I'm confused.

Why do people always forget about ave()? I'd try

newdata <- transform(mydata, csum=ave(x, id, as.Date(date), FUN=cumsum))

You still need to sort, of course, at least by date (incl. time). Check
carefully whether time zone is an issue for as.Date --- you may need the
tz
argument.

Also notice that I shy using an R function name as a variable name. This
is
mostly superstition these days, but better safe than sorry.

Thank you

James

On 23 November 2012 03:54, arun kirshna [via R]
<[hidden email]> wrote:
Hi,
No problem.
One more method if you wanted to try:
library(data.table)
dat2<-data.table(dat1)
dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
#   id  x  time Cumsum
#1:  1  5 12:01      5
#2:  1 14 12:02     19
#3:  1  6 12:03     25
#4:  1  3 12:04     28
#5:  2 98 12:01     98
#6:  2 23 12:02    121
#7:  2  1 12:03    122
#8:  2  4 12:04    126
#9:  3  5 12:01      5
#10:  3 65 12:02     70
#11:  3 23 12:03     93
#12:  3 23 12:04    116

A.K.

----- Original Message -----
From: TheRealJimShady <[hidden email]>
To: [hidden email]
Cc:
Sent: Thursday, November 22, 2012 12:27 PM
Subject: Re: [R] Using cumsum with 'group by' ?

Thank you very much, I will try these tomorrow morning.

On 22 November 2012 17:25, arun kirshna [via R]
<[hidden email]> wrote:

HI,
You can do this in many ways:
dat1<-read.table(text="
id    time    x
1   12:01    5
1   12:02   14
1   12:03   6
1   12:04   3
2   12:01   98
2   12:02   23
2   12:03   1
2   12:04   4
3   12:01   5
3   12:02   65
3   12:03   23
3   12:04   23
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
#or
unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
# [1]   5  19  25  28  98 121 122 126   5  70  93 116
#or
library(plyr)
ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
# [1]   5  19  25  28  98 121 122 126   5  70  93 116
head(dat1)
#  id  time  x Cumsum
#1  1 12:01  5      5
#2  1 12:02 14     19
#3  1 12:03  6     25
#4  1 12:04  3     28
#5  2 12:01 98     98
#6  2 12:02 23    121
A.K.

________________________________
If you reply to this email, your message will be added to the
discussion
below:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
Sent from the R help mailing list archive at Nabble.com.
   [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________
If you reply to this email, your message will be added to the
discussion
below:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html

Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________
If you reply to this email, your message will be added to the discussion
below:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650550.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML
Peter Dalgaard's suggestion works for me ...

lines<-"id x date time
1 5 2012-06-05 12:01 
1 10 2012-06-05 12:02 
1 45 2012-06-05 12:03 
2 5 2012-06-05 12:01 
2 3 2012-06-05 12:03 
2 2 2012-06-05 12:05 
3 5 2012-06-05 12:03 
3 5 2012-06-05 12:04 
3 8 2012-06-05 12:05 
1 5 2012-06-08 13:01 
1 9 2012-06-08 13:02 
1 3 2012-06-08 13:03 
2 0 2012-06-08 13:15 
2 1 2012-06-08 13:18 
2 8 2012-06-08 13:20 
2 4 2012-06-08 13:21 
3 6 2012-06-08 13:15 
3 2 2012-06-08 13:16 
3 7 2012-06-08 13:17 
3 2 2012-06-08 13:18 
"

# read in data
dat1<-read.table(textConnection(lines), header=TRUE,stringsAsFactors=FALSE) 

# build csum variable
newdata <- transform(dat1, csum=ave(x, id, as.Date(date), FUN=cumsum)) 

# order data (not really necessary)
newdata.ord <-newdata[order(dat1[,"id"],dat1[,"date"],dat1[,"time"]),]

Or have I misinterpreted your request?

HTH

Pete (B not D)

--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650556.html
Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121127/213a8179/attachment.pl>

Thanks everyone for your help. All good now. Once I'd got the data I
ended up doing this (replaced with real column names rather than the
fakes before).

microaeth_data$date<-as.Date(microaeth_data$date_time,format="%Y-%m-%d
%H:%M:%S")

dat2<-microaeth_data[order(microaeth_data[,13],microaeth_data[,14]),]

dat2$cumsum<-ave(dat2$bc,list(dat2$person_id,dat2$date),FUN=cumsum)

If someone can be bothered I'd appreciate a breakdown of the final
line of code so that I understood what I did, but no problem if not.

OK. ave() is originally designed to replace observations by their per-group average. It works by splitting the input vector into groups according to one or more factors, performing FUN on each group, and putting the result back in the original positions. The default for FUN is mean, but it doesn't need to be a scalar function; it also works with a FUN that returns a vector the same length as the input, such as FUN=cumsum.

The specification does not seem to allow a list() of grouping factors. It does work, but as it is unauthorised, it might not keep working.

Elaborating on the help page example:
attach(warpbreaks)
ave(breaks,wool,tension)
[1] 44.55556 44.55556 44.55556 44.55556 44.55556 44.55556 44.55556 44.55556
 [9] 44.55556 24.00000 24.00000 24.00000 24.00000 24.00000 24.00000 24.00000
[17] 24.00000 24.00000 24.55556 24.55556 24.55556 24.55556 24.55556 24.55556
[25] 24.55556 24.55556 24.55556 28.22222 28.22222 28.22222 28.22222 28.22222
[33] 28.22222 28.22222 28.22222 28.22222 28.77778 28.77778 28.77778 28.77778
[41] 28.77778 28.77778 28.77778 28.77778 28.77778 18.77778 18.77778 18.77778
[49] 18.77778 18.77778 18.77778 18.77778 18.77778 18.77778
ave(breaks,wool,tension, FUN=sum)
[1] 401 401 401 401 401 401 401 401 401 216 216 216 216 216 216 216 216 216 221
[20] 221 221 221 221 221 221 221 221 254 254 254 254 254 254 254 254 254 259 259
[39] 259 259 259 259 259 259 259 169 169 169 169 169 169 169 169 169
ave(breaks,wool,tension, FUN=cumsum)
[1]  26  56 110 135 205 257 308 334 401  18  39  68  85  97 115 150 180 216  36
[20]  57  81  99 109 152 180 195 221  27  41  70  89 118 149 190 210 254  42  68
[39]  87 103 142 170 191 230 259  20  41  65  82  95 110 125 141 169
ave(breaks,list(wool,tension), FUN=cumsum) # not actually supposed to work
[1]  26  56 110 135 205 257 308 334 401  18  39  68  85  97 115 150 180 216  36
[20]  57  81  99 109 152 180 195 221  27  41  70  89 118 149 190 210 254  42  68
[39]  87 103 142 170 191 230 259  20  41  65  82  95 110 125 141 169
Thanks

James

On 23 November 2012 20:06, arun kirshna [via R]
<ml-node+s789695n4650584h30 at n4.nabble.com> wrote:
HI,

If that is the case, this should work:
dat1<-read.table(text="
id,          x,          date
1,          5,          2012-06-05 12:01
1,          10,        2012-06-05 12:02
1,          45,        2012-06-05 12:03
2,          5,          2012-06-05 12:01
2,          3,          2012-06-05 12:03
2,          2,          2012-06-05 12:05
3,          5,          2012-06-05 12:03
3,          5,          2012-06-05 12:04
3,          8,          2012-06-05 12:05
1,          5,          2012-06-08 13:01
1,          9,          2012-06-08 13:02
1,          3,          2012-06-08 13:03
2,          0,          2012-06-08 13:15
2,          1,          2012-06-08 13:18
2,          8,          2012-06-08 13:20
2,          4,          2012-06-08 13:21
3,          6,          2012-06-08 13:15
3,          2,          2012-06-08 13:16
3,          7,          2012-06-08 13:17
3,          2,          2012-06-08 13:18
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat1$date<-as.Date(dat1$date,format="%Y-%m-%d %H:%M")
dat2<-dat1[order(dat1[,1],dat1[,3]),]
dat2$Cumsum<-ave(dat2$x,list(dat2$id,dat2$date),FUN=cumsum)

head(dat2)
#   id  x       date Cumsum
#1   1  5 2012-06-05      5
#2   1 10 2012-06-05     15
#3   1 45 2012-06-05     60
#10  1  5 2012-06-08      5
#11  1  9 2012-06-08     14
#12  1  3 2012-06-08     17
#or
with(dat2,aggregate(x,by=list(id=id,date=date),cumsum))
#  id       date            x
#1  1 2012-06-05    5, 15, 60
#2  2 2012-06-05     5, 8, 10
#3  3 2012-06-05    5, 10, 18
#4  1 2012-06-08    5, 14, 17
#5  2 2012-06-08  0, 1, 9, 13
#6  3 2012-06-08 6, 8, 15, 17
A.K.

----- Original Message -----
From: TheRealJimShady <[hidden email]>
To: [hidden email]
Cc:
Sent: Friday, November 23, 2012 6:04 AM
Subject: Re: [R] Using cumsum with 'group by' ?

Hi Arun & everyone,

Thank you very much for your helpful suggestions. I've been working
through them, but have realised that my data is a little more
complicated than I said and that the solutions you've kindly provided
don't work. The problem is that there is more than one day of data for
each person. It looks like this:

id          x          date
1          5          2012-06-05 12:01
1          10        2012-06-05 12:02
1          45        2012-06-05 12:03
2          5          2012-06-05 12:01
2          3          2012-06-05 12:03
2          2          2012-06-05 12:05
3          5          2012-06-05 12:03
3          5          2012-06-05 12:04
3          8          2012-06-05 12:05
1          5          2012-06-08 13:01
1          9          2012-06-08 13:02
1          3          2012-06-08 13:03
2          0          2012-06-08 13:15
2          1          2012-06-08 13:18
2          8          2012-06-08 13:20
2          4          2012-06-08 13:21
3          6          2012-06-08 13:15
3          2          2012-06-08 13:16
3          7          2012-06-08 13:17
3          2          2012-06-08 13:18

So what I need to do is something like this (in pseudo code anyway):

- Order the data by the id field and then the date field
- add a new variable called cumsum
- calculate this variable as the cumulative value of X, but grouping
by the id and date (not date, not date and time).

Thank you

James

On 23 November 2012 03:54, arun kirshna [via R]
<[hidden email]> wrote:

Hi,
No problem.
One more method if you wanted to try:
library(data.table)
dat2<-data.table(dat1)
dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
#   id  x  time Cumsum
#1:  1  5 12:01      5
#2:  1 14 12:02     19
#3:  1  6 12:03     25
#4:  1  3 12:04     28
#5:  2 98 12:01     98
#6:  2 23 12:02    121
#7:  2  1 12:03    122
#8:  2  4 12:04    126
#9:  3  5 12:01      5
#10:  3 65 12:02     70
#11:  3 23 12:03     93
#12:  3 23 12:04    116

A.K.

----- Original Message -----
From: TheRealJimShady <[hidden email]>
To: [hidden email]
Cc:
Sent: Thursday, November 22, 2012 12:27 PM
Subject: Re: [R] Using cumsum with 'group by' ?

Thank you very much, I will try these tomorrow morning.

On 22 November 2012 17:25, arun kirshna [via R]
<[hidden email]> wrote:

HI,
You can do this in many ways:
dat1<-read.table(text="
id    time    x
1   12:01    5
1   12:02   14
1   12:03   6
1   12:04   3
2   12:01   98
2   12:02   23
2   12:03   1
2   12:04   4
3   12:01   5
3   12:02   65
3   12:03   23
3   12:04   23
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
#or
unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
# [1]   5  19  25  28  98 121 122 126   5  70  93 116
#or
library(plyr)
ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
# [1]   5  19  25  28  98 121 122 126   5  70  93 116
head(dat1)
#  id  time  x Cumsum
#1  1 12:01  5      5
#2  1 12:02 14     19
#3  1 12:03  6     25
#4  1 12:04  3     28
#5  2 12:01 98     98
#6  2 12:02 23    121
A.K.

________________________________
If you reply to this email, your message will be added to the discussion
below:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
Sent from the R help mailing list archive at Nabble.com.
   [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________
If you reply to this email, your message will be added to the discussion
below:

http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html

Sent from the R help mailing list archive at Nabble.com.
   [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________
If you reply to this email, your message will be added to the discussion
below:
http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650584.html
To unsubscribe from Using cumsum with 'group by' ?, click here.
NAML

--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650959.html
Sent from the R help mailing list archive at Nabble.com.
	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com