rle with data.table - is it possible?
Here is what I get when I try to use your algorithm:
myf <- function( s ) {
seg <- rep( 0, length( s ) )
rs <- rle( s )
span <- rs$lengths[ rs$values ]
seg[ s ] <- rep( seq_along( span ), times = span )
seg
}
DT <- data.table( x )
DT[ , dadseg := myf( Dad %in% c( "AA", "RR" ) ), by=Group ]
DT[ , mumseg := myf( Mum %in% c( "AA", "RR" ) ), by=Group ]
DT[ , childseg := myf( Child %in% c( "AA", "RR" ) ), by=Group ]
DT
Dad Mum Child Group dadseg mumseg childseg 1: AA RR RA A 1 1 0 2: AA RR RR A 1 1 1 3: AA AA AA B 1 1 1 4: AA AA AA B 1 1 1 5: RA AA RR B 0 1 1 6: RR AA RR B 2 1 1 7: AA AA AA B 2 1 1 8: AA AA RA C 1 1 0 9: AA AA RA C 1 1 0 10: AA RR RA C 1 1 0
On Fri, 2 Jan 2015, Jeff Newmiller wrote:
The problem is that I cannot see how your use of rle and/or seq_along
could possibly lead to the sample result you are giving us. That is why
I asked for a new example.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On January 2, 2015 5:11:09 PM PST, Beejai <kate.ignatius at gmail.com> wrote:
Obviously this is why I need help... This is a larger data frame. I'm only posting something small here to make it simple. There are many Groups which are larger, and I want to assign a sequence value to consecutive rows where sumchild in not equal to 0. As the data frame I'm working with is much larger, this goes up to 100 maybe even 200 and I have many different groups 20K+. I would like to do this for every group, not for the whole data frame. There is no particular science behind this, only data organizing. So just say we had data like so: Dad Mum Child Group sumdad summum sumchild childseg 1: AA RR RA A 2 2 0 0 2: AA RR RR A 2 2 1 1 3: AA AA AA B 4 5 5 1 4: AA AA RA B 4 5 5 0 5: RA AA RR B 0 5 5 2 6: RR AA RR B 4 5 5 2 7: AA AA AA B 4 5 5 2 8: AA AA AA C 3 3 0 1 9: AA AA RA C 3 3 0 0 10: AA RR RR C 3 3 0 2 11: AA RR RA C 2 2 0 0 12: AA RR RR C 2 2 1 3 13: AA AA AA C 4 5 5 3 14: AA AA RA C 4 5 5 0 15: RA AA RR C 0 5 5 4 On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R] <ml-node+s789695n4701316h51 at n4.nabble.com> wrote:
On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
Ah, crap. Yep you're right. This is not going too well. Okay - let me try that again: x$childseg<-0 x<-x$sumchild !=0
That previous line would appear to overwrite the entire dataframe
with the
value of one vector
span<-rle(x)$lengths[rle(x)$values==TRUE] x$childseg[x]<-rep(seq_along(span), times = span) Does this one have any errors?
Even assuming that the code from Jeff Newmiller is creating those
objects I
get
x$childseg[x]<-rep(seq_along(span), times = span)
Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors In the last line you are indexing a vector with a dataframe (or
perhaps a
data.table). If we use Newmiller's object and then change some of the instances of
"x" in
your code to DT we get:
DT$childseg<-0 x<-DT$sumchild !=0 # Try not to overwrite your data-objects span<-rle(x)$lengths[rle(x)$values==TRUE] DT$childseg[x]<-rep(seq_along(span), times = span) DT
Dad Mum Child Group sumdad summum sumchild childseg 1: AA RR RA A 2 2 0 0 2: AA RR RR A 2 2 1 1 3: AA AA AA B 4 5 5 1 4: AA AA AA B 4 5 5 1 5: RA AA RR B 0 5 5 1 6: RR AA RR B 4 5 5 1 7: AA AA AA B 4 5 5 1 8: AA AA RA C 3 3 0 0 9: AA AA RA C 3 3 0 0 10: AA RR RA C 3 3 0 0 You persist in posting code where you do not explain what you are
trying to
do with it. You have already been told that your earlier efforts
using `rle`
did not make any sense. Post a complete example and then explain what
you
desire as an object. It's often helpful to provide a scientific
background
for what the data represents. -- David.
On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
wrote:
On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote: Apologies - mix up of syntax all over the place, a habit of mine.
The
last line was in there because of code beforehand so it really
doesn't
need to be there. Here is the proper code I hope: childseg<-0 x<-sumchild ==0 span<-rle(x)$lengths[rle(x)$values==TRUE] childseg[x]<-rep(seq_along(span), times = span)
This remains not reproducible. We have no idea what sumchild might
be and
the code throws an error. My guess is that you are trying to get a
result
such as would be delivered by: childseg <- sumchild[ sumchild != 0 ] ? David.
On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller <[hidden email]> wrote:
Thank you for attempting to encode what you want using R syntax,
but
you are not really succeeding yet (too many errors). Perhaps
another hand
generated result would help? A new input data frame might or
might not be
needed to illustrate desired results. Your second and third lines are syntactically incorrect, and I
don't
understand what you hope to accomplish by assigning an empty
string to a
numeric in your last line.
---------------------------------------------------------------------------
Jeff Newmiller The ..... .....
Go
Live...
DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#..
Playing
Research Engineer (Solar/Batteries O.O#. #.O#.
with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity. On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]> wrote:
Is it possible to add the following code or similar in
data.table:
childseg<-0 x:=sumchild <-0 span<-rle(x)$lengths[rle(x)$values==TRUE childseg[x]<-rep(seq_along(span), times = span) childseg[childseg == 0]<-'' I was hoping to do this code by Group for mum, dad and child. The problem I'm having is with the span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
can
be added to data.table. [Previous email had incorrect code] On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller <[hidden email]> wrote:
I do not understand the value of using the rle function in your
description,
but the code below appears to produce the table you want. Note that better support for the data.table package might be
found at
stackexchange as the documentation specifies. x <- read.table( text= "Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C ", header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT <- data.table( x ) DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] DT[ , cdad := NULL ] DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ] DT[ , summum := 0L ] DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] DT[ , cmum := NULL ] DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ] DT[ , sumchild := 0L ] DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] DT[ , cchild := NULL ]
DT
Dad Mum Child Group sumdad summum sumchild 1: AA RR RA A 2 2 0 2: AA RR RR A 2 2 1 3: AA AA AA B 4 5 5 4: AA AA AA B 4 5 5 5: RA AA RR B 0 5 5 6: RR AA RR B 4 5 5 7: AA AA AA B 4 5 5 8: AA AA RA C 3 3 0 9: AA AA RA C 3 3 0 10: AA RR RA C 3 3 0 On Tue, 30 Dec 2014, Kate Ignatius wrote:
I'm trying to use both these packages and wondering whether
they are
possible... To make this simple, my ultimate goal is determine long
stretches of
1s, but I want to do this within groups (hence using the
data.table
as
I use the "set key" option. However, I'm I'm not having much
luck
making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] However, I wish to do the above code by Group (though this
file is
millions of rows long and groups will be larger but just
wanted to
simply the example). I did something like this but of course I got an error: LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
The reason being as I want to eventually have something like
this:
Dad Mum Child Group sumdad summum sumchild AA RR RA A 2 2 0 AA RR RR A 2 2 1 AA AA AA B 4 5 5 AA AA AA B 4 5 5 RA AA RR B 0 5 5 RR AA RR B 4 5 5 AA AA AA B 4 5 5 AA AA RA C 3 3 0 AA AA RA C 3 3 0 AA RR RA C 3 3 0 That is, I would like to have the specific counts next to what
I'm
consecutively counting per group. So for Group A for dad
there are
2
AAs, there are two RRs for mum but only 1 AA or RR for the
child
and
that is RR (so the 1 is next to the RR and not the RA). Can this be done? K.
______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... .....
Go
Live...
DCN:<[hidden email]> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#..
Playing
Research Engineer (Solar/Batteries O.O#. #.O#.
with
/Software/Embedded Controllers) .OO#. .OO#.
rocks...1k
---------------------------------------------------------------------------
______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ________________________________ If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html To unsubscribe from rle with data.table - is it possible?, click here. NAML -- View this message in context: http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k