Dear R-users, I have a dataset with categories and numbers. I would like to compute and add cumulative numbers to the dataset. I do not understand the structure of by(...) or tapply(...) output enough to handle it. Here a small example -------------- d<-expand.grid(a=1:5,b=1:3,c=1:2) d$n = 10 * d$a + d$b +0.1* d$c Sn<-by(d$n,list(d$a,d$c),cumsum) str(Sn) --------- List of 10 $ : num [1:3] 11.1 23.2 36.3 $ : num [1:3] 21.1 43.2 66.3 $ : num [1:3] 31.1 63.2 96.3 $ : num [1:3] 41.1 83.2 126.3 $ : num [1:3] 51.1 103.2 156.3 $ : num [1:3] 11.2 23.4 36.6 $ : num [1:3] 21.2 43.4 66.6 $ : num [1:3] 31.2 63.4 96.6 $ : num [1:3] 41.2 83.4 126.6 $ : num [1:3] 51.2 103.4 156.6 - attr(*, "dim")= int [1:2] 5 2 - attr(*, "dimnames")=List of 2 ..$ : chr [1:5] "1" "2" "3" "4" ... ..$ : chr [1:2] "1" "2" - attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a, d$c), FUN = cumsum) - attr(*, "class")= chr "by --------- # these give (a) lists of one numerical vector(a) Sn[5,2] Sn[cbind(d$a,d$c)] # how to access the individual cumsum values? # and assign them to d$Sn? -------------- Thanks, Gerrit. --- Gerrit Draisma Department of Public Health Erasmus MC, University Medical Center Rotterdam Room AE-235 P.O. Box 2040 3000 CA Rotterdam The Netherlands Phone: +31 10 7043787 Fax: +31 10 7038474 http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
understanding output of tapply/by cumsum
3 messages · Gerrit Draisma, jim holtman
Maybe 'ave' is what you were looking for:
d$cum <- ave(d$n, d$a, d$c, FUN = cumsum) d
a b c n cum 1 1 1 1 11.1 11.1 2 2 1 1 21.1 21.1 3 3 1 1 31.1 31.1 4 4 1 1 41.1 41.1 5 5 1 1 51.1 51.1 6 1 2 1 12.1 23.2 7 2 2 1 22.1 43.2 8 3 2 1 32.1 63.2 9 4 2 1 42.1 83.2 10 5 2 1 52.1 103.2 11 1 3 1 13.1 36.3 12 2 3 1 23.1 66.3 13 3 3 1 33.1 96.3 14 4 3 1 43.1 126.3 15 5 3 1 53.1 156.3 16 1 1 2 11.2 11.2 17 2 1 2 21.2 21.2 18 3 1 2 31.2 31.2 19 4 1 2 41.2 41.2 20 5 1 2 51.2 51.2 21 1 2 2 12.2 23.4 22 2 2 2 22.2 43.4 23 3 2 2 32.2 63.4 24 4 2 2 42.2 83.4 25 5 2 2 52.2 103.4 26 1 3 2 13.2 36.6 27 2 3 2 23.2 66.6 28 3 3 2 33.2 96.6 29 4 3 2 43.2 126.6 30 5 3 2 53.2 156.6
On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma <gdraisma at xs4all.nl> wrote:
Dear R-users, I have a dataset with categories and numbers. I would like to compute and add cumulative numbers to the dataset. I do not understand the structure of by(...) or tapply(...) output enough to handle it. Here a small example -------------- d<-expand.grid(a=1:5,b=1:3,c=1:2) d$n = 10 * d$a + d$b +0.1* d$c Sn<-by(d$n,list(d$a,d$c),cumsum) str(Sn) --------- List of 10 ?$ : num [1:3] 11.1 23.2 36.3 ?$ : num [1:3] 21.1 43.2 66.3 ?$ : num [1:3] 31.1 63.2 96.3 ?$ : num [1:3] ?41.1 ?83.2 126.3 ?$ : num [1:3] ?51.1 103.2 156.3 ?$ : num [1:3] 11.2 23.4 36.6 ?$ : num [1:3] 21.2 43.4 66.6 ?$ : num [1:3] 31.2 63.4 96.6 ?$ : num [1:3] ?41.2 ?83.4 126.6 ?$ : num [1:3] ?51.2 103.4 156.6 ?- attr(*, "dim")= int [1:2] 5 2 ?- attr(*, "dimnames")=List of 2 ?..$ : chr [1:5] "1" "2" "3" "4" ... ?..$ : chr [1:2] "1" "2" ?- attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a, d$c), FUN = cumsum) ?- attr(*, "class")= chr "by --------- # these give (a) lists of one numerical vector(a) Sn[5,2] Sn[cbind(d$a,d$c)] # how to access the individual cumsum values? # and assign them to d$Sn? -------------- Thanks, Gerrit. --- Gerrit Draisma Department of Public Health Erasmus MC, University Medical Center Rotterdam Room AE-235 P.O. Box 2040 3000 CA ?Rotterdam The Netherlands Phone: +31 10 7043787 Fax: +31 10 7038474 http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
You can also use 'split' to separate each group:
split(d, list(d$a, d$c))
$`1.1` a b c n cum 1 1 1 1 11.1 11.1 6 1 2 1 12.1 23.2 11 1 3 1 13.1 36.3 $`2.1` a b c n cum 2 2 1 1 21.1 21.1 7 2 2 1 22.1 43.2 12 2 3 1 23.1 66.3 $`3.1` a b c n cum 3 3 1 1 31.1 31.1 8 3 2 1 32.1 63.2 13 3 3 1 33.1 96.3 $`4.1` a b c n cum 4 4 1 1 41.1 41.1 9 4 2 1 42.1 83.2 14 4 3 1 43.1 126.3 $`5.1` a b c n cum 5 5 1 1 51.1 51.1 10 5 2 1 52.1 103.2 15 5 3 1 53.1 156.3 $`1.2` a b c n cum 16 1 1 2 11.2 11.2 21 1 2 2 12.2 23.4 26 1 3 2 13.2 36.6 $`2.2` a b c n cum 17 2 1 2 21.2 21.2 22 2 2 2 22.2 43.4 27 2 3 2 23.2 66.6 $`3.2` a b c n cum 18 3 1 2 31.2 31.2 23 3 2 2 32.2 63.4 28 3 3 2 33.2 96.6 $`4.2` a b c n cum 19 4 1 2 41.2 41.2 24 4 2 2 42.2 83.4 29 4 3 2 43.2 126.6 $`5.2` a b c n cum 20 5 1 2 51.2 51.2 25 5 2 2 52.2 103.4 30 5 3 2 53.2 156.6
On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma <gdraisma at xs4all.nl> wrote:
Dear R-users, I have a dataset with categories and numbers. I would like to compute and add cumulative numbers to the dataset. I do not understand the structure of by(...) or tapply(...) output enough to handle it. Here a small example -------------- d<-expand.grid(a=1:5,b=1:3,c=1:2) d$n = 10 * d$a + d$b +0.1* d$c Sn<-by(d$n,list(d$a,d$c),cumsum) str(Sn) --------- List of 10 ?$ : num [1:3] 11.1 23.2 36.3 ?$ : num [1:3] 21.1 43.2 66.3 ?$ : num [1:3] 31.1 63.2 96.3 ?$ : num [1:3] ?41.1 ?83.2 126.3 ?$ : num [1:3] ?51.1 103.2 156.3 ?$ : num [1:3] 11.2 23.4 36.6 ?$ : num [1:3] 21.2 43.4 66.6 ?$ : num [1:3] 31.2 63.4 96.6 ?$ : num [1:3] ?41.2 ?83.4 126.6 ?$ : num [1:3] ?51.2 103.4 156.6 ?- attr(*, "dim")= int [1:2] 5 2 ?- attr(*, "dimnames")=List of 2 ?..$ : chr [1:5] "1" "2" "3" "4" ... ?..$ : chr [1:2] "1" "2" ?- attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a, d$c), FUN = cumsum) ?- attr(*, "class")= chr "by --------- # these give (a) lists of one numerical vector(a) Sn[5,2] Sn[cbind(d$a,d$c)] # how to access the individual cumsum values? # and assign them to d$Sn? -------------- Thanks, Gerrit. --- Gerrit Draisma Department of Public Health Erasmus MC, University Medical Center Rotterdam Room AE-235 P.O. Box 2040 3000 CA ?Rotterdam The Netherlands Phone: +31 10 7043787 Fax: +31 10 7038474 http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?