An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070527/0f981904/attachment.pl
Looking for the first observation within the month
6 messages · Albert Pang, jim holtman, Gabor Grothendieck
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070527/4691a08e/attachment.pl
Use the zoo package to represent data like this.
Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date. With FUN=head1, ave picks out the first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.
Lines <- "Date Observation
2007-05-23 20
2007-05-22 30
2007-05-21 10
2007-04-10 50
2007-04-09 40
2007-04-07 30
2007-03-05 10
"
library(zoo)
# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)
head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)
For more on zoo try:
library(zoo)
vignette("zoo")
and also read the Help Desk article in R News 4/1 about dates.
On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
Hi all, I have a simple data frame, first list is a list of dates (in "%Y-%m-%d" format) and second list an observation on that particular date. There might not be observations everyday. Let's just say there are no observations on saturdays and sundays. Now I want to select the first observation of every month into a list. Is there an easy way to do that? Date Observation ---- ----------- 2007-05-23 20 2007-05-22 30 2007-05-21 10 2007-04-10 50 2007-04-09 40 2007-04-07 30 2007-03-05 10 The result I need is the data frame 2007-05-21 10 2007-04-07 30 2007-03-05 10 or I am equally happy with just the vector c(10, 30, 10) I am new to R and after going through the manuals and the documentation I can gather, I have come up with a convoluted way of doing it 1) I first get the Date into a vector. (I am articificially reproducing this vector below and call it A)
> A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
> A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09" [6] "2007-04-07" "2007-03-05" 2) use cut with breaks falling on the months
> B<-cut(A, breaks="month") > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01 [7] 2007-03-01 Levels: 2007-03-01 2007-04-01 2007-05-01 3) then split to get a list of vectors group by the boundary of the date
> C<-split(A, B) > C
$`2007-03-01` [1] "2007-03-05" $`2007-04-01` [1] "2007-04-10" "2007-04-09" "2007-04-07" $`2007-05-01` [1] "2007-05-23" "2007-05-22" "2007-05-21" 4) in a for loop I loop through the elements within the list (the elements are vectors of dates) with each vector I find the minimum and concatentate it to a final vector D
> D<-numeric()
> for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
> class(D)<-"Date"
> D
[1] "2007-03-05" "2007-04-07" "2007-05-21"
Next with D, I then go back and find out the positions of the
elements in D within A. And then use the result as an index vector
into the vector of observations (which is not shown here) I feel
sure I am doing it the stupid way (or the procedural way)
Is there a more declarative way of doing it? Any pointers will be
greatly appreciated!
Thanks a lot in advance,
Albert Pang
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I have only just able to dissect Jim's solution and realize I am actually not very far away from the answer. One last step was to use "lapply". Jim, thanks again for the help. Gabor, thanks for the suggestion. Let me have a read on what the zoo package is about. Thanks a lot for the pointer! Albert
On May 27, 2007, at 10:48 PM, Gabor Grothendieck wrote:
Use the zoo package to represent data like this.
Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date. With FUN=head1, ave picks out the
first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.
Lines <- "Date Observation
2007-05-23 20
2007-05-22 30
2007-05-21 10
2007-04-10 50
2007-04-09 40
2007-04-07 30
2007-03-05 10
"
library(zoo)
# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)
head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)
For more on zoo try:
library(zoo)
vignette("zoo")
and also read the Help Desk article in R News 4/1 about dates.
On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
Hi all, I have a simple data frame, first list is a list of dates (in "%Y-%m-%d" format) and second list an observation on that particular date. There might not be observations everyday. Let's just say there are no observations on saturdays and sundays. Now I want to select the first observation of every month into a list. Is there an easy way to do that? Date Observation ---- ----------- 2007-05-23 20 2007-05-22 30 2007-05-21 10 2007-04-10 50 2007-04-09 40 2007-04-07 30 2007-03-05 10 The result I need is the data frame 2007-05-21 10 2007-04-07 30 2007-03-05 10 or I am equally happy with just the vector c(10, 30, 10) I am new to R and after going through the manuals and the documentation I can gather, I have come up with a convoluted way of doing it 1) I first get the Date into a vector. (I am articificially reproducing this vector below and call it A)
> A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
> A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09" [6] "2007-04-07" "2007-03-05" 2) use cut with breaks falling on the months
> B<-cut(A, breaks="month") > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01 [7] 2007-03-01 Levels: 2007-03-01 2007-04-01 2007-05-01 3) then split to get a list of vectors group by the boundary of the date
> C<-split(A, B) > C
$`2007-03-01` [1] "2007-03-05" $`2007-04-01` [1] "2007-04-10" "2007-04-09" "2007-04-07" $`2007-05-01` [1] "2007-05-23" "2007-05-22" "2007-05-21" 4) in a for loop I loop through the elements within the list (the elements are vectors of dates) with each vector I find the minimum and concatentate it to a final vector D
> D<-numeric()
> for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
> class(D)<-"Date"
> D
[1] "2007-03-05" "2007-04-07" "2007-05-21"
Next with D, I then go back and find out the positions of the
elements in D within A. And then use the result as an index vector
into the vector of observations (which is not shown here) I feel
sure I am doing it the stupid way (or the procedural way)
Is there a more declarative way of doing it? Any pointers will be
greatly appreciated!
Thanks a lot in advance,
Albert Pang
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
Here is one additional solution, also using zoo. Using z from the prior solution as.yearmon(time(z)) is, as before, the year/month of each date and tapply(time(z), as.yearmon(time(z)), head, 1) gets the first date within each month; however, tapply converts it to numeric so we use as.Date to convert it back again. Then we use window to select those dates. window(z, as.Date(tapply(time(z), as.yearmon(time(z)), head, 1)))
On 5/27/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Use the zoo package to represent data like this.
Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date. With FUN=head1, ave picks out the first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.
Lines <- "Date Observation
2007-05-23 20
2007-05-22 30
2007-05-21 10
2007-04-10 50
2007-04-09 40
2007-04-07 30
2007-03-05 10
"
library(zoo)
# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)
head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)
For more on zoo try:
library(zoo)
vignette("zoo")
and also read the Help Desk article in R News 4/1 about dates.
On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
Hi all, I have a simple data frame, first list is a list of dates (in "%Y-%m-%d" format) and second list an observation on that particular date. There might not be observations everyday. Let's just say there are no observations on saturdays and sundays. Now I want to select the first observation of every month into a list. Is there an easy way to do that? Date Observation ---- ----------- 2007-05-23 20 2007-05-22 30 2007-05-21 10 2007-04-10 50 2007-04-09 40 2007-04-07 30 2007-03-05 10 The result I need is the data frame 2007-05-21 10 2007-04-07 30 2007-03-05 10 or I am equally happy with just the vector c(10, 30, 10) I am new to R and after going through the manuals and the documentation I can gather, I have come up with a convoluted way of doing it 1) I first get the Date into a vector. (I am articificially reproducing this vector below and call it A)
> A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
> A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09" [6] "2007-04-07" "2007-03-05" 2) use cut with breaks falling on the months
> B<-cut(A, breaks="month") > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01 [7] 2007-03-01 Levels: 2007-03-01 2007-04-01 2007-05-01 3) then split to get a list of vectors group by the boundary of the date
> C<-split(A, B) > C
$`2007-03-01` [1] "2007-03-05" $`2007-04-01` [1] "2007-04-10" "2007-04-09" "2007-04-07" $`2007-05-01` [1] "2007-05-23" "2007-05-22" "2007-05-21" 4) in a for loop I loop through the elements within the list (the elements are vectors of dates) with each vector I find the minimum and concatentate it to a final vector D
> D<-numeric()
> for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
> class(D)<-"Date"
> D
[1] "2007-03-05" "2007-04-07" "2007-05-21"
Next with D, I then go back and find out the positions of the
elements in D within A. And then use the result as an index vector
into the vector of observations (which is not shown here) I feel
sure I am doing it the stupid way (or the procedural way)
Is there a more declarative way of doing it? Any pointers will be
greatly appreciated!
Thanks a lot in advance,
Albert Pang
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
One additional simplification. If we use simplify = FALSE then tapply won't simplify its answer to numeric and we can avoid using as.Date in the last solution: window(z, tapply(time(z), as.yearmon(time(z)), head, 1, simplify = FALSE))
On 5/27/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Here is one additional solution, also using zoo. Using z from the prior solution as.yearmon(time(z)) is, as before, the year/month of each date and tapply(time(z), as.yearmon(time(z)), head, 1) gets the first date within each month; however, tapply converts it to numeric so we use as.Date to convert it back again. Then we use window to select those dates. window(z, as.Date(tapply(time(z), as.yearmon(time(z)), head, 1))) On 5/27/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Use the zoo package to represent data like this.
Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date. With FUN=head1, ave picks out the first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.
Lines <- "Date Observation
2007-05-23 20
2007-05-22 30
2007-05-21 10
2007-04-10 50
2007-04-09 40
2007-04-07 30
2007-03-05 10
"
library(zoo)
# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)
head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)
For more on zoo try:
library(zoo)
vignette("zoo")
and also read the Help Desk article in R News 4/1 about dates.
On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
Hi all, I have a simple data frame, first list is a list of dates (in "%Y-%m-%d" format) and second list an observation on that particular date. There might not be observations everyday. Let's just say there are no observations on saturdays and sundays. Now I want to select the first observation of every month into a list. Is there an easy way to do that? Date Observation ---- ----------- 2007-05-23 20 2007-05-22 30 2007-05-21 10 2007-04-10 50 2007-04-09 40 2007-04-07 30 2007-03-05 10 The result I need is the data frame 2007-05-21 10 2007-04-07 30 2007-03-05 10 or I am equally happy with just the vector c(10, 30, 10) I am new to R and after going through the manuals and the documentation I can gather, I have come up with a convoluted way of doing it 1) I first get the Date into a vector. (I am articificially reproducing this vector below and call it A)
> A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
> A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09" [6] "2007-04-07" "2007-03-05" 2) use cut with breaks falling on the months
> B<-cut(A, breaks="month") > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01 [7] 2007-03-01 Levels: 2007-03-01 2007-04-01 2007-05-01 3) then split to get a list of vectors group by the boundary of the date
> C<-split(A, B) > C
$`2007-03-01` [1] "2007-03-05" $`2007-04-01` [1] "2007-04-10" "2007-04-09" "2007-04-07" $`2007-05-01` [1] "2007-05-23" "2007-05-22" "2007-05-21" 4) in a for loop I loop through the elements within the list (the elements are vectors of dates) with each vector I find the minimum and concatentate it to a final vector D
> D<-numeric()
> for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
> class(D)<-"Date"
> D
[1] "2007-03-05" "2007-04-07" "2007-05-21"
Next with D, I then go back and find out the positions of the
elements in D within A. And then use the result as an index vector
into the vector of observations (which is not shown here) I feel
sure I am doing it the stupid way (or the procedural way)
Is there a more declarative way of doing it? Any pointers will be
greatly appreciated!
Thanks a lot in advance,
Albert Pang
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.