Looking for the first observation within the month

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070527/0f981904/attachment.pl
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070527/4691a08e/attachment.pl
Use the zoo package to represent data like this.

Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date.  With FUN=head1, ave picks out the first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.

Lines <- "Date                    Observation

2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10
"

library(zoo)

# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)

head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)

For more on zoo try:

library(zoo)
vignette("zoo")

and also read the Help Desk article in R News 4/1 about dates.
Hi all, I have a simple data frame, first list is a list of dates (in
"%Y-%m-%d" format) and second list an observation on that particular
date.  There might not be observations everyday.  Let's just say
there are no observations on saturdays and sundays.  Now I want to
select the first observation of every month into a list.  Is there an
easy way to do that?

Date                    Observation
----                    -----------
2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10

The result I need is the data frame

2007-05-21              10
2007-04-07              30
2007-03-05              10

or I am equally happy with just the vector c(10, 30, 10)

I am new to R and after going through the manuals and the
documentation I can gather, I have come up with a convoluted way of
doing it

1)  I first get the Date into a vector.  (I am articificially
reproducing this vector below and call it A)

 > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
 > A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09"
[6] "2007-04-07" "2007-03-05"

2)  use cut with breaks falling on the months

 > B<-cut(A, breaks="month")
 > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01
[7] 2007-03-01
Levels: 2007-03-01 2007-04-01 2007-05-01

3)  then split to get a list of vectors group by the boundary of the
date

 > C<-split(A, B)
 > C
$`2007-03-01`
[1] "2007-03-05"

$`2007-04-01`
[1] "2007-04-10" "2007-04-09" "2007-04-07"

$`2007-05-01`
[1] "2007-05-23" "2007-05-22" "2007-05-21"

4)  in a for loop I loop through the elements within the list (the
elements are vectors of dates) with each vector I find the minimum
and concatentate it to a final vector D

 > D<-numeric()
 > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
 > class(D)<-"Date"
 > D
[1] "2007-03-05" "2007-04-07" "2007-05-21"

Next with D, I then go back and find out the positions of the
elements in D within A.  And then use the result as an index vector
into the vector of observations (which is not shown here)  I feel
sure I am doing it the stupid way (or the procedural way)

Is there a more declarative way of doing it?  Any pointers will be
greatly appreciated!

Thanks a lot in advance,

Albert Pang

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

I have only just able to dissect Jim's solution and realize I am  
actually not very far away from the answer.  One last step was to use  
"lapply".  Jim, thanks again for the help.

Gabor, thanks for the suggestion.  Let me have a read on what the zoo  
package is about.  Thanks a lot for the pointer!

Albert

Use the zoo package to represent data like this.

Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date.  With FUN=head1, ave picks out the  
first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.

Lines <- "Date                    Observation

2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10
"

library(zoo)

# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)

head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)

For more on zoo try:

library(zoo)
vignette("zoo")

and also read the Help Desk article in R News 4/1 about dates.

On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
Hi all, I have a simple data frame, first list is a list of dates (in
"%Y-%m-%d" format) and second list an observation on that particular
date.  There might not be observations everyday.  Let's just say
there are no observations on saturdays and sundays.  Now I want to
select the first observation of every month into a list.  Is there an
easy way to do that?

Date                    Observation
----                    -----------
2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10

The result I need is the data frame

2007-05-21              10
2007-04-07              30
2007-03-05              10

or I am equally happy with just the vector c(10, 30, 10)

I am new to R and after going through the manuals and the
documentation I can gather, I have come up with a convoluted way of
doing it

1)  I first get the Date into a vector.  (I am articificially
reproducing this vector below and call it A)

 > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
 > A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09"
[6] "2007-04-07" "2007-03-05"

2)  use cut with breaks falling on the months

 > B<-cut(A, breaks="month")
 > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01
[7] 2007-03-01
Levels: 2007-03-01 2007-04-01 2007-05-01

3)  then split to get a list of vectors group by the boundary of the
date

 > C<-split(A, B)
 > C
$`2007-03-01`
[1] "2007-03-05"

$`2007-04-01`
[1] "2007-04-10" "2007-04-09" "2007-04-07"

$`2007-05-01`
[1] "2007-05-23" "2007-05-22" "2007-05-21"

4)  in a for loop I loop through the elements within the list (the
elements are vectors of dates) with each vector I find the minimum
and concatentate it to a final vector D

 > D<-numeric()
 > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
 > class(D)<-"Date"
 > D
[1] "2007-03-05" "2007-04-07" "2007-05-21"

Next with D, I then go back and find out the positions of the
elements in D within A.  And then use the result as an index vector
into the vector of observations (which is not shown here)  I feel
sure I am doing it the stupid way (or the procedural way)

Is there a more declarative way of doing it?  Any pointers will be
greatly appreciated!

Thanks a lot in advance,

Albert Pang

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html
and provide commented, minimal, self-contained, reproducible code.

Here is one additional solution, also using zoo.  Using z from
the prior solution as.yearmon(time(z)) is, as before, the year/month
of each date and tapply(time(z), as.yearmon(time(z)), head, 1)
gets the first date within each month; however, tapply converts it
to numeric so we use as.Date to convert it back again.  Then
we use window to select those dates.

window(z, as.Date(tapply(time(z), as.yearmon(time(z)), head, 1)))
Use the zoo package to represent data like this.

Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date.  With FUN=head1, ave picks out the first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.

Lines <- "Date                    Observation

2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10
"

library(zoo)

# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)

head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)

For more on zoo try:

library(zoo)
vignette("zoo")

and also read the Help Desk article in R News 4/1 about dates.

On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
Hi all, I have a simple data frame, first list is a list of dates (in
"%Y-%m-%d" format) and second list an observation on that particular
date.  There might not be observations everyday.  Let's just say
there are no observations on saturdays and sundays.  Now I want to
select the first observation of every month into a list.  Is there an
easy way to do that?

Date                    Observation
----                    -----------
2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10

The result I need is the data frame

2007-05-21              10
2007-04-07              30
2007-03-05              10

or I am equally happy with just the vector c(10, 30, 10)

I am new to R and after going through the manuals and the
documentation I can gather, I have come up with a convoluted way of
doing it

1)  I first get the Date into a vector.  (I am articificially
reproducing this vector below and call it A)

 > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
 > A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09"
[6] "2007-04-07" "2007-03-05"

2)  use cut with breaks falling on the months

 > B<-cut(A, breaks="month")
 > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01
[7] 2007-03-01
Levels: 2007-03-01 2007-04-01 2007-05-01

3)  then split to get a list of vectors group by the boundary of the
date

 > C<-split(A, B)
 > C
$`2007-03-01`
[1] "2007-03-05"

$`2007-04-01`
[1] "2007-04-10" "2007-04-09" "2007-04-07"

$`2007-05-01`
[1] "2007-05-23" "2007-05-22" "2007-05-21"

4)  in a for loop I loop through the elements within the list (the
elements are vectors of dates) with each vector I find the minimum
and concatentate it to a final vector D

 > D<-numeric()
 > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
 > class(D)<-"Date"
 > D
[1] "2007-03-05" "2007-04-07" "2007-05-21"

Next with D, I then go back and find out the positions of the
elements in D within A.  And then use the result as an index vector
into the vector of observations (which is not shown here)  I feel
sure I am doing it the stupid way (or the procedural way)

Is there a more declarative way of doing it?  Any pointers will be
greatly appreciated!

Thanks a lot in advance,

Albert Pang

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

One additional simplification.  If we use simplify = FALSE then
tapply won't simplify its answer to numeric and we can
avoid using as.Date in the last solution:

 window(z, tapply(time(z), as.yearmon(time(z)), head, 1, simplify = FALSE))
Here is one additional solution, also using zoo.  Using z from
the prior solution as.yearmon(time(z)) is, as before, the year/month
of each date and tapply(time(z), as.yearmon(time(z)), head, 1)
gets the first date within each month; however, tapply converts it
to numeric so we use as.Date to convert it back again.  Then
we use window to select those dates.

window(z, as.Date(tapply(time(z), as.yearmon(time(z)), head, 1)))

On 5/27/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Use the zoo package to represent data like this.

Here time(z) is a vector of the dates and as.yearmon(time(z))
is the year/month of each date.  With FUN=head1, ave picks out the first
date in any month and aggregate then aggregates over all
values in the same year/month choosing the first one.

Lines <- "Date                    Observation

2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10
"

library(zoo)

# z <- read.zoo("myfile.dat", header = TRUE)
z <- read.zoo(textConnection(Lines), header = TRUE)

head1 <- function(x, n = 1) head(x, n)
aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)

For more on zoo try:

library(zoo)
vignette("zoo")

and also read the Help Desk article in R News 4/1 about dates.

On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
Hi all, I have a simple data frame, first list is a list of dates (in
"%Y-%m-%d" format) and second list an observation on that particular
date.  There might not be observations everyday.  Let's just say
there are no observations on saturdays and sundays.  Now I want to
select the first observation of every month into a list.  Is there an
easy way to do that?

Date                    Observation
----                    -----------
2007-05-23              20
2007-05-22              30
2007-05-21              10

2007-04-10              50
2007-04-09              40
2007-04-07              30

2007-03-05              10

The result I need is the data frame

2007-05-21              10
2007-04-07              30
2007-03-05              10

or I am equally happy with just the vector c(10, 30, 10)

I am new to R and after going through the manuals and the
documentation I can gather, I have come up with a convoluted way of
doing it

1)  I first get the Date into a vector.  (I am articificially
reproducing this vector below and call it A)

 > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
("2007-04-07"), as.Date("2007-03-05"))
 > A
[1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09"
[6] "2007-04-07" "2007-03-05"

2)  use cut with breaks falling on the months

 > B<-cut(A, breaks="month")
 > B
[1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01
[7] 2007-03-01
Levels: 2007-03-01 2007-04-01 2007-05-01

3)  then split to get a list of vectors group by the boundary of the
date

 > C<-split(A, B)
 > C
$`2007-03-01`
[1] "2007-03-05"

$`2007-04-01`
[1] "2007-04-10" "2007-04-09" "2007-04-07"

$`2007-05-01`
[1] "2007-05-23" "2007-05-22" "2007-05-21"

4)  in a for loop I loop through the elements within the list (the
elements are vectors of dates) with each vector I find the minimum
and concatentate it to a final vector D

 > D<-numeric()
 > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
 > class(D)<-"Date"
 > D
[1] "2007-03-05" "2007-04-07" "2007-05-21"

Next with D, I then go back and find out the positions of the
elements in D within A.  And then use the result as an index vector
into the vector of observations (which is not shown here)  I feel
sure I am doing it the stupid way (or the procedural way)

Is there a more declarative way of doing it?  Any pointers will be
greatly appreciated!

Thanks a lot in advance,

Albert Pang

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.