Skip to content

Trimming time series to only include complete years

3 messages · Morway, Eric, Jeff Newmiller

#
In bulk processing streamflow data available from an online database, I'm
wanting to trim the beginning and end of the time series so that daily data
associated with incomplete "water years" (defined as extending from Oct 1st
to the following September 30th) is trimmed off the beginning and end of
the series.

For a small reproducible example, the time series below starts on
2010-01-01 and ends on 2011-11-05.  So the data between 2010-01-01 and
2010-09-30 and also between 2011-10-01 and 2011-11-05 is not associated
with a complete set of data for their respective water years.  With the
real data, the initial date of collection is arbitrary, could be 1901 or
1938, etc.  Because I'm cycling through potentially thousands of records, I
need help in designing a function that is efficient.

dat <-
data.frame(Date=seq(as.Date("2010-01-01"),as.Date("2011-11-05"),by="day"))
dat$Q <- rnorm(nrow(dat))

dat$wyr <- as.numeric(format(dat$Date,"%Y"))
is.nxt <- as.numeric(format(dat$Date,"%m")) %in% 1:9
dat$wyr[!is.nxt] <- dat$wyr[!is.nxt] + 1


function(dat) {
   ...
   returns a subset of dat such that dat$Date > xxxx-09-30 & dat$Date <
yyyy-10-01
   ...
}

where the years between xxxx-yyyy are "complete" (no missing days).  In the
example above, the returned dat would extend from 2010-10-01 to 2011-09-30

Any offered guidance is very much appreciated.
1 day later
#
# read about POSIXlt at ?DateTimeClasses
# note that the "mon" element is 0-11
isPartialWaterYear <- function( d ) {
   dtl <- as.POSIXlt( dat$Date )
   wy1 <- cumsum( ( 9 == dtl$mon ) & ( 1 == dtl$mday ) )
   ( 0 == wy1  # first partial year
   | (  8 != dtl$mon[ nrow( dat ) ] # end partial year
     & 30 != dtl$mday[ nrow( dat ) ]
     ) & wy1[ nrow( dat ) ] == wy1
   )
}

dat2 <- dat[ !isPartialWaterYear( dat$Date ), ]

The above assumes that, as you said, the data are continuous at one-day 
intervals, such that the only partial years will occur at the beginning 
and end. The "diff" function could be used to identify irregular data 
within the data interval if needed.
On Fri, 27 May 2016, Morway, Eric wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
2 days later
#
Sorry, I put too many bugs (opportunities for excellence!) in this on my 
first pass on this to leave it alone :-(

isPartialWaterYear2 <- function( d ) {
   dtl <- as.POSIXlt( d )
   wy1 <- cumsum( ( 9 == dtl$mon ) & ( 1 == dtl$mday ) )
   # any 0 in wy1 corresponds to first partial water year
   result <- 0 == wy1
   # if last day is not Sep 30, mark last water year as partial
   if ( 8 != dtl$mon[ length( d ) ]
      | 30 != dtl$mday[ length( d ) ] ) {
         result[ wy1[ length( d ) ] == wy1 ] <- TRUE
   }
   result
}

dat2 <- dat[ !isPartialWaterYear( dat$Date ), ]
On Sat, 28 May 2016, Jeff Newmiller wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k