Skip to content
Prev 348605 / 398500 Next

Processing key_column, begin_date, end_date in R

Here is another way. Have not tested for large scale efficiency, but if you convert dta to a data.table that might improve things.

library(dplyr)
dta <- read.csv( text=
"key_column,begin_date,end_date
123456,2013-01-01,2014-01-01
123456,2013-07-01,2014-07-01
789102,2012-03-01,2014-03-01
789102,2015-02-01,2016-02-01
789102,2015-02-06,2016-02-06
789102,2015-02-28,2015-03-31
789102,2015-04-30,2015-05-31
", as.is=TRUE)
( dta
%>% mutate( begin_date = as.Date( begin_date ),
end_date = as.Date( end_date ) )
%>% arrange( key_column, begin_date )
) -> dta

mkgp <- function( begin_date, cend ) {
  ix <- c( TRUE, cend[ -length( begin_date ) ] < begin_date[ -1 ] )
  cumsum( ix )
}

result <- ( dta
          %>% group_by( key_column )
          %>% mutate( cend = as.Date( cummax( as.numeric( end_date ) )
                                    , origin="1970-01-01" )
                      , gp = mkgp( begin_date, cend )
                      )
          %>% ungroup
          %>% group_by( key_column, gp )
          %>% summarise(  begin_date = begin_date[ 1 ]
                        , end_date = cend[ length( cend ) ]
                        )
          %>% ungroup
          %>% select( -gp )
          %>% as.data.frame
          )
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
On February 25, 2015 1:18:58 PM PST, Matt Gross <grossm at gmail.com> wrote: