Message: 63
Date: Wed, 26 Jan 2005 04:28:51 +0000 (UTC)
From: Gabor Grothendieck <ggrothendieck at myway.com>
Subject: Re: [R] chron: parsing dates into a data frame using a
forloop
To: r-help at stat.math.ethz.ch
Message-ID: <loom.20050126T052153-333 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
Benjamin M. Osborne <Benjamin.Osborne <at> uvm.edu> writes:
:
: I have one data frame with a column of dates and I want to fill another data
: frame with one column of dates, one of years, one of months, one of a unique
: combination of year and month, and one of days, but R seems to have some
: problems with this. My initial data frame looks like this (ignore the NAs in
: the other fields):
:
: > mans[1:10,]
: date loc snow.new prcp tmin snow.dep tmax
: 1 11/01/54 2 NA NA NA NA NA
: 2 11/02/54 2 NA NA NA NA NA
: 3 11/03/54 2 NA NA NA NA NA
: 4 11/04/54 2 NA NA NA NA NA
: 5 11/05/54 2 NA NA NA NA NA
: 6 11/06/54 2 NA NA NA NA NA
: 7 11/07/54 2 NA NA NA NA NA
: 8 11/08/54 2 NA NA NA NA NA
: 9 11/09/54 2 NA NA NA NA NA
: 10 11/10/54 2 NA NA NA NA NA
: >
:
: The code and resultant data frame look like this:
:
: > for(i in 1:10){
: + mans.met$date[i]<-mans$date[i]
: + mans.met$year[i]<-years(mans.met$date[i])
: + mans.met$month[i]<-months(mans.met$date[i])
: + mans.met$yearmo[i]<-cut(mans.met$date[i], "months")
: + mans.met$day[i]<-days(mans.met$date[i])
: + }
: > mans.met[1:10,]
: date year month yearmo day snow.new snow.dep prcp tmin tmax tmean
: 1 11/01/54 1 11 1 1 NA NA NA NA NA NA
: 2 11/02/54 1 11 1 2 NA NA NA NA NA NA
: 3 11/03/54 1 11 1 3 NA NA NA NA NA NA
: 4 11/04/54 1 11 1 4 NA NA NA NA NA NA
: 5 11/05/54 1 11 1 5 NA NA NA NA NA NA
: 6 11/06/54 1 11 1 6 NA NA NA NA NA NA
: 7 11/07/54 1 11 1 7 NA NA NA NA NA NA
: 8 11/08/54 1 11 1 8 NA NA NA NA NA NA
: 9 11/09/54 1 11 1 9 NA NA NA NA NA NA
: 10 11/10/54 1 11 1 10 NA NA NA NA NA NA
: >
:
: The problem seems to be with assigning within the forloop, or making the
: assignment into a data frame, since:
:
: > years(mans.met$date[5])
: [1] 1954
: Levels: 1954
: > test<-years(mans.met$date[5])
: > test
: [1] 1954
: Levels: 1954
: >
: > months(mans.met$date[5])
: [1] Nov
: 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
: > test<-months(mans.met$date[5])
: > test
: [1] Nov
: 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
: >
: > cut(mans.met$date[3], "months")
: [1] Nov 54
: Levels: Nov 54
: > test<-cut(mans.met$date[3], "months")
: > test
: [1] Nov 54
: Levels: Nov 54
: >
: > days(mans.met$date[4])
: [1] 4
: 31 Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12 < 13 < ... < 31
: > test<-days(mans.met$date[4])
: > test
: [1] 4
: 31 Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12 < 13 < ... < 31
: >
:
: Any suggestions will be appreciated.
: -Ben Osborne
I guess you set up mans.met as numeric columns and when you
assign your factors to numeric variables you get
the underlying codes. Note that if f is a factor then as.numeric(f)
gives the codes underlying the factor whereas as.character(f) gives
the labels.
It would be better not to use a loop at all. I don't know whether you
want or not want factors but at any rate here is something you could
try. It creates data frame df2 without a loop.
df2 <- data.frame(date = mans$date, yearmo = as.character(cut(mans$date, "m")))
df2 <- cbind(df2, month.day.year(mans$date))
Finally, do you really want this redundant representation? I would tend to
go with just storing the dates and computing any of the other quantities
on-the-fly as needed.
##########
The reason for the redundancy is that I will want to summarize these 50 years of
daily time series data by month, so that records that share each unique year
and month in the mans.met$yearmo column will be summed or averaged, etc. into a
new row in another data frame(mans.monthly, having
nrow=length(unique(mans.met$yearmo))). The way I would do this is again using
a forloop, but the loop won't recognize :
for (i in 1:(length(unique(mans.met$yearmo[i])))){
What I really need to know is why I can call any ith of
unique(mans.met$yearmo[i])
by itself, but not in a loop.
Or, perhaps there is an even easier way to extract the year and month from the
date
column on the fly to compute these summaries?
Thanks,
Ben Osborne
Botany Department
University of Vermont
109 Carrigan Drive
Burlington, VT 05405
benjamin.osborne at uvm.edu
phone: 802-656-0297
fax: 802-656-0440
Benjamin M. Osborne <Benjamin.Osborne <at> uvm.edu> writes:
:
: Message: 63
: Date: Wed, 26 Jan 2005 04:28:51 +0000 (UTC)
: From: Gabor Grothendieck <ggrothendieck <at> myway.com>
: Subject: Re: [R] chron: parsing dates into a data frame using a
: forloop
: To: r-help <at> stat.math.ethz.ch
: Message-ID: <loom.20050126T052153-333 <at> post.gmane.org>
: Content-Type: text/plain; charset=us-ascii
:
: Benjamin M. Osborne <Benjamin.Osborne <at> uvm.edu> writes:
:
: :
: : I have one data frame with a column of dates and I want to fill another
data
: : frame with one column of dates, one of years, one of months, one of a
unique
: : combination of year and month, and one of days, but R seems to have some
: : problems with this. My initial data frame looks like this (ignore the NAs
in
: : the other fields):
: :
: : > mans[1:10,]
: : date loc snow.new prcp tmin snow.dep tmax
: : 1 11/01/54 2 NA NA NA NA NA
: : 2 11/02/54 2 NA NA NA NA NA
: : 3 11/03/54 2 NA NA NA NA NA
: : 4 11/04/54 2 NA NA NA NA NA
: : 5 11/05/54 2 NA NA NA NA NA
: : 6 11/06/54 2 NA NA NA NA NA
: : 7 11/07/54 2 NA NA NA NA NA
: : 8 11/08/54 2 NA NA NA NA NA
: : 9 11/09/54 2 NA NA NA NA NA
: : 10 11/10/54 2 NA NA NA NA NA
: : >
: :
: : The code and resultant data frame look like this:
: :
: : > for(i in 1:10){
: : + mans.met$date[i]<-mans$date[i]
: : + mans.met$year[i]<-years(mans.met$date[i])
: : + mans.met$month[i]<-months(mans.met$date[i])
: : + mans.met$yearmo[i]<-cut(mans.met$date[i], "months")
: : + mans.met$day[i]<-days(mans.met$date[i])
: : + }
: : > mans.met[1:10,]
: : date year month yearmo day snow.new snow.dep prcp tmin tmax tmean
: : 1 11/01/54 1 11 1 1 NA NA NA NA NA NA
: : 2 11/02/54 1 11 1 2 NA NA NA NA NA NA
: : 3 11/03/54 1 11 1 3 NA NA NA NA NA NA
: : 4 11/04/54 1 11 1 4 NA NA NA NA NA NA
: : 5 11/05/54 1 11 1 5 NA NA NA NA NA NA
: : 6 11/06/54 1 11 1 6 NA NA NA NA NA NA
: : 7 11/07/54 1 11 1 7 NA NA NA NA NA NA
: : 8 11/08/54 1 11 1 8 NA NA NA NA NA NA
: : 9 11/09/54 1 11 1 9 NA NA NA NA NA NA
: : 10 11/10/54 1 11 1 10 NA NA NA NA NA NA
: : >
: :
: : The problem seems to be with assigning within the forloop, or making the
: : assignment into a data frame, since:
: :
: : > years(mans.met$date[5])
: : [1] 1954
: : Levels: 1954
: : > test<-years(mans.met$date[5])
: : > test
: : [1] 1954
: : Levels: 1954
: : >
: : > months(mans.met$date[5])
: : [1] Nov
: : 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
: : > test<-months(mans.met$date[5])
: : > test
: : [1] Nov
: : 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
: : >
: : > cut(mans.met$date[3], "months")
: : [1] Nov 54
: : Levels: Nov 54
: : > test<-cut(mans.met$date[3], "months")
: : > test
: : [1] Nov 54
: : Levels: Nov 54
: : >
: : > days(mans.met$date[4])
: : [1] 4
: : 31 Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12 < 13 < ... < 31
: : > test<-days(mans.met$date[4])
: : > test
: : [1] 4
: : 31 Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12 < 13 < ... < 31
: : >
: :
: : Any suggestions will be appreciated.
: : -Ben Osborne
:
: I guess you set up mans.met as numeric columns and when you
: assign your factors to numeric variables you get
: the underlying codes. Note that if f is a factor then as.numeric(f)
: gives the codes underlying the factor whereas as.character(f) gives
: the labels.
:
: It would be better not to use a loop at all. I don't know whether you
: want or not want factors but at any rate here is something you could
: try. It creates data frame df2 without a loop.
:
: df2 <- data.frame(date = mans$date, yearmo = as.character(cut
(mans$date, "m")))
: df2 <- cbind(df2, month.day.year(mans$date))
:
: Finally, do you really want this redundant representation? I would tend to
: go with just storing the dates and computing any of the other quantities
: on-the-fly as needed.
:
: ##########
: The reason for the redundancy is that I will want to summarize these 50
years of
: daily time series data by month, so that records that share each unique year
: and month in the mans.met$yearmo column will be summed or averaged, etc.
into a
: new row in another data frame(mans.monthly, having
: nrow=length(unique(mans.met$yearmo))). The way I would do this is again
using
: a forloop, but the loop won't recognize :
: for (i in 1:(length(unique(mans.met$yearmo[i])))){
This seems circular. You are defining i in terms of i.
:
: What I really need to know is why I can call any ith of
: unique(mans.met$yearmo[i])
: by itself, but not in a loop.
:
: Or, perhaps there is an even easier way to extract the year and month from
the
: date
: column on the fly to compute these summaries?
Look at ?aggregate, ?by and ?tapply. e.g.
aggregate(mans[,-1], list(cut(mans$date, "m")), mean)