An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20071216/fa75d37b/attachment.pl
question about the aggregate function with respect to order of levels of grouping elements
5 messages · tom soyer, Gabor Grothendieck, jim holtman
This does look strange. Note that aggregate.zoo in the zoo package would work here:
library(zoo) aggregate(zoo(rnum, dts), as.yearmon, sum)
Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 4.43610085 0.49842227 7.52139932 1.47917343 10.64459923 -1.22530586 Jul 2001 Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001 8.19563685 1.57626974 1.28842871 2.50540074 0.71156951 0.54118342 Jan 2002 Feb 2002 Mar 2002 Apr 2002 May 2002 Jun 2002 -0.41292840 -2.41301496 3.23783551 0.63914807 -1.46357402 2.91651492 Jul 2002 Aug 2002 Sep 2002 Oct 2002 Nov 2002 Dec 2002 2.17263290 -2.30981022 -9.60701788 1.16504368 -3.07038254 1.38281927 Jan 2003 Feb 2003 Mar 2003 Apr 2003 May 2003 Jun 2003 4.48761479 2.42455090 -0.03743888 1.11223001 -4.07988016 -1.15116293 Jul 2003 Aug 2003 Sep 2003 Oct 2003 Nov 2003 Dec 2003 -7.15292576 -2.34231702 -0.48132751 11.74252191 2.51063034 -4.35801058
On Dec 16, 2007 9:23 AM, tom soyer <tom.soyer at gmail.com> wrote:
Hi,
I am using aggregate() to add up groups of data according to year and month.
It seems that the function aggregate() automatically sorts the levels of
factors of the grouping elements, even if the order of the levels of factors
is supplied. I am wondering if this is a bug, or if I missed something
important. Below is an example that shows what I mean. Does anyone know if
this is just the way the aggregate function works, or are there ways
to force aggregate() to keep the order of levels of factors supplied by the
grouping elements? Thanks!
library(chron)
dts=seq.dates("1/1/01","12/31/03")
rnum=rnorm(1:length(dts))
df=data.frame(date=dts,obs=rnum)
agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum)
levels(agg$month) # aggregate() automatically generates levels sorted by
alphabet.
[1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
fmonth=factor(months(df[,1]))
levels(fmonth) # factor() automatically generates the correct order of
levels.
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum)
levels(agg2$month) # even if a factor with levels in the correct order is
supplied, aggregate(), sortsthe levels by alphabet regardless.
[1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
--
Tom
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
In fact, even ordinary aggegate works ok with zoo's as.yearmon:
aggregate(rnum, list(dts = as.yearmon(dts)), sum)
dts x 1 Jan 2001 4.43610085 2 Feb 2001 0.49842227 3 Mar 2001 7.52139932 4 Apr 2001 1.47917343 5 May 2001 10.64459923 6 Jun 2001 -1.22530586 7 Jul 2001 8.19563685 8 Aug 2001 1.57626974 9 Sep 2001 1.28842871 10 Oct 2001 2.50540074 11 Nov 2001 0.71156951 12 Dec 2001 0.54118342 13 Jan 2002 -0.41292840 14 Feb 2002 -2.41301496 15 Mar 2002 3.23783551 16 Apr 2002 0.63914807 17 May 2002 -1.46357402 18 Jun 2002 2.91651492 19 Jul 2002 2.17263290 20 Aug 2002 -2.30981022 21 Sep 2002 -9.60701788 22 Oct 2002 1.16504368 23 Nov 2002 -3.07038254 24 Dec 2002 1.38281927 25 Jan 2003 4.48761479 26 Feb 2003 2.42455090 27 Mar 2003 -0.03743888 28 Apr 2003 1.11223001 29 May 2003 -4.07988016 30 Jun 2003 -1.15116293 31 Jul 2003 -7.15292576 32 Aug 2003 -2.34231702 33 Sep 2003 -0.48132751 34 Oct 2003 11.74252191 35 Nov 2003 2.51063034 36 Dec 2003 -4.35801058
On Dec 16, 2007 9:50 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
This does look strange. Note that aggregate.zoo in the zoo package would work here:
library(zoo) aggregate(zoo(rnum, dts), as.yearmon, sum)
Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 4.43610085 0.49842227 7.52139932 1.47917343 10.64459923 -1.22530586 Jul 2001 Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001 8.19563685 1.57626974 1.28842871 2.50540074 0.71156951 0.54118342 Jan 2002 Feb 2002 Mar 2002 Apr 2002 May 2002 Jun 2002 -0.41292840 -2.41301496 3.23783551 0.63914807 -1.46357402 2.91651492 Jul 2002 Aug 2002 Sep 2002 Oct 2002 Nov 2002 Dec 2002 2.17263290 -2.30981022 -9.60701788 1.16504368 -3.07038254 1.38281927 Jan 2003 Feb 2003 Mar 2003 Apr 2003 May 2003 Jun 2003 4.48761479 2.42455090 -0.03743888 1.11223001 -4.07988016 -1.15116293 Jul 2003 Aug 2003 Sep 2003 Oct 2003 Nov 2003 Dec 2003 -7.15292576 -2.34231702 -0.48132751 11.74252191 2.51063034 -4.35801058 On Dec 16, 2007 9:23 AM, tom soyer <tom.soyer at gmail.com> wrote:
Hi,
I am using aggregate() to add up groups of data according to year and month.
It seems that the function aggregate() automatically sorts the levels of
factors of the grouping elements, even if the order of the levels of factors
is supplied. I am wondering if this is a bug, or if I missed something
important. Below is an example that shows what I mean. Does anyone know if
this is just the way the aggregate function works, or are there ways
to force aggregate() to keep the order of levels of factors supplied by the
grouping elements? Thanks!
library(chron)
dts=seq.dates("1/1/01","12/31/03")
rnum=rnorm(1:length(dts))
df=data.frame(date=dts,obs=rnum)
agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum)
levels(agg$month) # aggregate() automatically generates levels sorted by
alphabet.
[1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
fmonth=factor(months(df[,1]))
levels(fmonth) # factor() automatically generates the correct order of
levels.
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum)
levels(agg2$month) # even if a factor with levels in the correct order is
supplied, aggregate(), sortsthe levels by alphabet regardless.
[1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
--
Tom
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
What version of R are you using? Here is the output I got with 2.6.1:
library(chron)
dts=seq.dates("1/1/01","12/31/03")
rnum=rnorm(1:length(dts))
df=data.frame(date=dts,obs=rnum)
agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum)
levels(agg$month) # aggregate() automatically generates levels sorted by alphabet.
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
fmonth=factor(months(df[,1])) levels(fmonth) # factor() automatically generates the correct order of levels.
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum) levels(agg2$month) # even if a factor with levels in the correct order is supplied, aggregate(), sortsthe levels by alphabet regardless.
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Order seems to be correct.
On Dec 16, 2007 9:23 AM, tom soyer <tom.soyer at gmail.com> wrote:
Hi,
I am using aggregate() to add up groups of data according to year and month.
It seems that the function aggregate() automatically sorts the levels of
factors of the grouping elements, even if the order of the levels of factors
is supplied. I am wondering if this is a bug, or if I missed something
important. Below is an example that shows what I mean. Does anyone know if
this is just the way the aggregate function works, or are there ways
to force aggregate() to keep the order of levels of factors supplied by the
grouping elements? Thanks!
library(chron)
dts=seq.dates("1/1/01","12/31/03")
rnum=rnorm(1:length(dts))
df=data.frame(date=dts,obs=rnum)
agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum)
levels(agg$month) # aggregate() automatically generates levels sorted by
alphabet.
[1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
fmonth=factor(months(df[,1]))
levels(fmonth) # factor() automatically generates the correct order of
levels.
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum)
levels(agg2$month) # even if a factor with levels in the correct order is
supplied, aggregate(), sortsthe levels by alphabet regardless.
[1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
--
Tom
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20071216/a32df678/attachment.pl