Skip to content
Prev 377709 / 398502 Next

sample (randomly select) to get a number of successive days

Thank you so much Marc,
that is exactly what I need. That will save me weeks of work and additionally I learned a lot.
:-)
Have a great day!
Dagmar



Hi,

Given that your original data frame example is:

myframe <- data.frame (Timestamp=c("24.09.2012 09:00:00", "24.09.2012 10:00:00","25.09.2012 09:00:00",
                                    "25.09.2012 09:00:00","24.09.2012 09:00:00", "24.09.2012 10:00:00"),
                        Event=c(50,60,30,40,42,54))
'data.frame':	6 obs. of  2 variables:
  $ Timestamp: Factor w/ 3 levels "24.09.2012 09:00:00",..: 1 2 3 3 1 2
  $ Event    : num  50 60 30 40 42 54


Your Timestamp variable is a factor, not a datetime variable. So you first need to coerce it to one, in order to be able to define a range of dates.

Thus:

## See ?as.POSIXlt and the See Also links therein for more information on how R handles dates/times

myframe$Timestamp <- as.POSIXct(myframe$Timestamp, format = "%d.%m.%Y %H:%M:%S")
'data.frame':	6 obs. of  2 variables:
  $ Timestamp: POSIXct, format: "2012-09-24 09:00:00" ...
  $ Event    : num  50 60 30 40 42 54


So, to keep it simple, since you appear to be only concerned during the range selection process with the day and not the time, let's use the day part of the datetime as the basis for defining your interval. So, for clarity, let's create a new column in the data frame that is just the date:

myframe$day <- as.Date(myframe$Timestamp)
'data.frame':	6 obs. of  3 variables:
  $ Timestamp: POSIXct, format: "2012-09-24 09:00:00" ...
  $ Event    : num  50 60 30 40 42 54
  $ day      : Date, format: "2012-09-24" ...
Timestamp Event        day
1 2012-09-24 09:00:00    50 2012-09-24
2 2012-09-24 10:00:00    60 2012-09-24
3 2012-09-25 09:00:00    30 2012-09-25
4 2012-09-25 09:00:00    40 2012-09-25
5 2012-09-24 09:00:00    42 2012-09-24
6 2012-09-24 10:00:00    54 2012-09-24


With that in place, let's presume that you selected 2012-09-24 as your starting date. You can then use ?seq.Date to define the range:

set.seed(1)
start <- sample(myframe$day, 1)
[1] "2012-09-24"
Date[1:1], format: "2012-09-24"


So, create the range of 25 dates:
[1] "2012-09-24" "2012-09-25" "2012-09-26" "2012-09-27" "2012-09-28"
  [6] "2012-09-29" "2012-09-30" "2012-10-01" "2012-10-02" "2012-10-03"
[11] "2012-10-04" "2012-10-05" "2012-10-06" "2012-10-07" "2012-10-08"
[16] "2012-10-09" "2012-10-10" "2012-10-11" "2012-10-12" "2012-10-13"
[21] "2012-10-14" "2012-10-15" "2012-10-16" "2012-10-17" "2012-10-18"


Now, use the result of the above to subset your data frame. See ?subset and ?"%in%":

myframe.rand <- subset(myframe, day %in% seq(start, length.out = 25, by = "day"))


In your example, all rows will be returned, but from your larger dataset, you will only get the rows that have dates within the range defined.

Given the above, I will leave it to you to define the truncated date range from your full dataset, so that your initial starting date is sufficiently before your 'max' date, so that you can select 25 consecutive days.

Regards,

Marc Schwartz