Skip to content

change frequency of wind data correctly

13 messages · Jim Lemon, Stefano Sofia, Mathew Guilfoyle +4 more

#
Dear list users,
I have wind data with frequency of 10 minutes (three years data). For simplicity let me use only max wind speed.
I need to reduce the frequency to 30 minutes,  at  00 (taking the mean of data at 40, 50 and 00 minutes) and at 30 (taking the mean of data at 10, 20 and 30 minutes) of each hour.

The simple code here reported works well, but the column "interval" groups data forward, not backward:

init_day <- as.POSIXct("2018-02-01-00-00", format="%Y-%m-%d-%H-%M", tz="Etc/GMT-1")
fin_day <- as.POSIXct("2018-02-01-02-00", format="%Y-%m-%d-%H-%M", tz="Etc/GMT-1")
mydf <- data.frame(data_POSIX=seq(init_day, fin_day, by="10 mins"))
mydf$vmax <- round(rnorm(13, 35, 10))
mydf$interval <- cut(mydf$data_POSIX, , breaks="30 min")
means <- aggregate(vmax ~ interval, mydf, mean)

    data_POSIX                  vmax  interval
1  2018-02-01 00:00:00     27     2018-02-01 00:00:00
2  2018-02-01 00:10:00     41     2018-02-01 00:00:00
3  2018-02-01 00:20:00     46     2018-02-01 00:00:00
4  2018-02-01 00:30:00     39     2018-02-01 00:30:00
5  2018-02-01 00:40:00     34     2018-02-01 00:30:00
6  2018-02-01 00:50:00     32     2018-02-01 00:30:00
...

I should work with

    data_POSIX                  vmax  interval
1  2018-02-01 00:00:00     27     2018-02-01 00:00:00
2  2018-02-01 00:10:00     41     2018-02-01 00:30:00
3  2018-02-01 00:20:00     46     2018-02-01 00:30:00
4  2018-02-01 00:30:00     39     2018-02-01 00:30:00
5  2018-02-01 00:40:00     34     2018-02-01 00:00:00
6  2018-02-01 00:50:00     32     2018-02-01 00:00:00
...


Is there a way to modify this code to groupp data correctly? (I would prefer using only the base package)

Thank you for your help
Stefano



         (oo)
--oOO--( )--OOo----------------
Stefano Sofia PhD
Civil Protection - Marche Region
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona
Uff: 071 806 7743
E-mail: stefano.sofia at regione.marche.it
---Oo---------oO----------------

________________________________

AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system.

--
Questo messaggio  stato analizzato da Libra ESVA ed  risultato non infetto.
This message was scanned by Libra ESVA and is believed to be clean.
#
Hi Stefano,
I read in your date-time as two separate fields for convenience. You
can split your single field at the space to get the same result.

ssdf<-read.table(text="date_POSIX time_POSIX vmax
 2018-02-01 00:00:00 27
 2018-02-01 00:10:00 41
 2018-02-01 00:20:00 46
 2018-02-01 00:30:00 39
 2018-02-01 00:40:00 34
 2018-02-01 00:50:00 32",
 header=TRUE,stringsAsFactors=FALSE)
# get the time of day as seconds from the time field
ssdf$seconds<-as.numeric(strptime(ssdf$time_POSIX,"%H:%M:%S"))
# subtract whatever current date strptime guesses for the date
ssdf$seconds<-ssdf$seconds-min(ssdf$seconds)
# create an AM/PM variable
ssdf$ampm<-ifelse(ssdf$seconds > 0 & ssdf$seconds <= 1800,"am","pm")
means<-aggregate(vmax~ampm,ssdf,mean)

Jim

On Thu, Dec 3, 2020 at 4:55 AM Stefano Sofia
<stefano.sofia at regione.marche.it> wrote:
#
Hi again,
Didn't realize that the example didn't even span a full day.

ssdf<-read.table(text="date_POSIX time_POSIX vmax
 2018-02-01 00:00:00 27
 2018-02-01 00:10:00 41
 2018-02-01 00:20:00 46
 2018-02-01 00:30:00 39
 2018-02-01 00:40:00 34
 2018-02-01 00:50:00 32
 2018-02-01 01:00:00 37
 2018-02-01 01:10:00 31
 2018-02-01 01:20:00 26
 2018-02-01 01:30:00 29
 2018-02-01 01:40:00 24
 2018-02-01 01:50:00 35",
 header=TRUE,stringsAsFactors=FALSE)
# extract the hour
ssdf$hour<-
 as.numeric(unlist(lapply(strsplit(ssdf$time_POSIX,":"),"[",1)))
# get the time of day as seconds from the time field
ssdf$mins<-
 as.numeric(unlist(lapply(strsplit(ssdf$time_POSIX,":"),"[",2)))
# create an AM/PM variable
ssdf$ampm<-ifelse(ssdf$mins > 0 & ssdf$mins <= 30,"am","pm")
# drop first row
ssdf<-ssdf[-1,]
means<-aggregate(vmax~hour+ampm,ssdf,mean)

This does a full day. To do more, add the date_POSIX field to the
aggregate command. If you have the date and time in one field you'll
have to split that. That will distinguish the AM/PM means in each day
as well as hour.

Jim
On Thu, Dec 3, 2020 at 2:10 PM Jim Lemon <drjimlemon at gmail.com> wrote:
#
Thank you Jim for your solution.
I understood everything. As you say, splitting the POSIXct field is the key.

I apologise for not having used dput. I never used it but I will get aknowleged with it in a short time.

Thank you
Stefano

         (oo)
--oOO--( )--OOo----------------
Stefano Sofia PhD
Civil Protection - Marche Region
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona
Uff: 071 806 7743
E-mail: stefano.sofia at regione.marche.it
---Oo---------oO----------------
3 days later
#
Hi Jim.
I studied and implemented your solution in details. The idea is great, but after a sharp revision I came to the conclusion that unfortunately it des not work correctly: for the "am" side (10, 20, 30 minutes) it works well because the hour is exactly the same, while for the "pm" side (40, 50, 00) the algorithm it doesn't because the hour related to 40 and 50 minutes is different from the hour related to 00 (which is the following one). Am I wrong?
I tried to fix it keeping the easy structure of the algorithm, but with no success.

Any hint for that?
Thank you for your attention and your help

Stefano


         (oo)
--oOO--( )--OOo----------------
Stefano Sofia PhD
Civil Protection - Marche Region
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona
Uff: 071 806 7743
E-mail: stefano.sofia at regione.marche.it
---Oo---------oO----------------
#
Hi Stefano

I think either of these does what you need...


1: This gets the interval column as you want it, but utilises the lubridate package:

library(lubridate)
mydf$interval = ceiling_date(mydf$data_POSIX, unit="30 minutes?)


2: Alternative in base R is a bit more long winded: convert the date to numeric (in seconds), divide by 1800 (seconds in 30min), take the ceiling, and convert back.

mydf$interval = as.POSIXct(ceiling(as.numeric(mydf$data_POSIX)/1800)*1800, origin="1970-01-01", tz="Etc/GMT-1")


Cheers

  
  
#
I usually roll my own:

-----------
Sys.setenv( TZ = "GMT" )
ssdf$Dtm <- with( ssdf
                 , as.POSIXct( paste( date_POSIX, time_POSIX ) )
                 )

ceiling_dtmN <- function( dtm, mins ) {
   tm_base <- as.POSIXct( trunc( min( dtm ), units = "days" ) )
   x <- as.numeric( dtm - tm_base, units = "mins" )
   xceil <- ceiling( x %/% mins ) * mins
   tm_base + as.difftime( xceil, units = "mins" )
}

ssdf$Dtm30 <- ceiling_dtmN( ssdf$Dtm, mins = 30 )
ssdf
-----------
On Sun, 6 Dec 2020, Stefano Sofia wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
#
Sigh. Don't use integer division AND ceiling.

ceiling_dtmN <- function( dtm, mins ) {
   tm_base <- as.POSIXct( trunc( min( dtm ), units = "days" ) )
   x <- as.numeric( dtm - tm_base, units = "mins" )
   xceil <- ceiling( x / mins ) * mins
   tm_base + as.difftime( xceil, units = "mins" )
}
On Sun, 6 Dec 2020, Jeff Newmiller wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
#
Hi,

Perhaps this might work for you.  It leverages findInterval() and a
simple look-up-table of times to do the grouping.  I made it return NA
when computing the mean when there are fewer than the three
observations.

Cheers,
Ben

n <- 144
x <- data.frame(
  datetime = seq(from = as.POSIXct("2018-02-01 00:00:00", tz = "UTC"),
                 by = "10 min",
                 length = n),
  vmax = sample(10:50, n, replace = TRUE)
)

lut <- seq(from = x$datetime[1],
           to = x$datetime[n],
           by = "30 min") + 1     # add one second so that 00 sorts
with 40, 50, 00
                                  # and the other grouping is 10, 20 30

x$interval <- findInterval(x$datetime, lut)
x

y <- aggregate(vmax ~ interval, data = x,
               FUN = function(x){
                 if (length(x) < 3){
                   r <- NA
                 } else {
                   r <- mean(x)
                 }
                 r
               })
y

On Sun, Dec 6, 2020 at 1:59 PM Stefano Sofia
<stefano.sofia at regione.marche.it> wrote:

  
    
#
To be honest, I would do this one of two ways.

(1) Use ?decimate from library(signal),
    decimating by a factor of three.

(2) Convert the variable to an (n/3)*3 matrix using
    as.matrix then use rowMeans or apply.

On Thu, 3 Dec 2020 at 06:55, Stefano Sofia <stefano.sofia at regione.marche.it>
wrote:

  
  
#
Beware of missing or extra records with these approaches. Also may be tricky to get the time aligned to the hour properly.
On December 6, 2020 9:12:01 PM PST, Richard O'Keefe <raoknz at gmail.com> wrote:

  
    
#
This afternoon I will work on that.
I am thinking to stick to Jim's algorithm, changing the hours related to 40 and 50 mins. If I add one hour to them, everything should work correctly, because wind data at 40, 50 and 00 would have exactly the same hours.
This would not be elegant, but efficient.
Also missing data might be easily handled, changing the "mean" function with a function that accepts NA.
But I will study all the solutions that have been kindly proposed.

Thank you all of you
Stefano


         (oo)
--oOO--( )--OOo----------------
Stefano Sofia PhD
Civil Protection - Marche Region
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona
Uff: 071 806 7743
E-mail: stefano.sofia at regione.marche.it
---Oo---------oO----------------
#
Instead of using breaks="30 mins" construct the breaks explicitly with
seq() so you can control the start point.  E.g.,
tz="Etc/GMT-1")
tz="Etc/GMT-1")
result more easily
by=as.difftime(30,units="mins"))
paste() so we can check results
             interval   vmax
1 2018-01-31 23:40:00      1
2 2018-02-01 00:10:00  2,3,4
3 2018-02-01 00:40:00  5,6,7
4 2018-02-01 01:10:00 8,9,10

On Wed, Dec 2, 2020 at 9:55 AM Stefano Sofia <
stefano.sofia at regione.marche.it> wrote: