struggling with "split" function - R-help

Sun, Sep 6, 2009 2:02 AM #

I am very sorry for such a simple question, but I am struggling with "split".
I have the following data frame:
x<-data.frame(A=c(NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA),
B=c("Name1","text1","text2","text3",NA,"Name2","text1","text2","text3",NA,"Name3","text1","text2","text3",NA,"Name4","text1","text2","text3"),
C=c(NA,1,NA,3,NA,NA,4,5,6,NA,NA,7,8,9,NA,NA,3,3,3),D=c(NA,1,1,2,NA,NA,5,6,NA,NA,NA,9,8,7,NA,NA,2,2,2),
E=c(NA,3,2,1,NA,NA,6,5,4,NA,NA,7,7,8,NA,NA,1,NA,1))
print(x)

All I want to do is to split x, i.e., to create a list of data frames
that are currently separated by the word "split" in column A. In this
example, it would be 4 data frames, the first of them being:
A B C D E
NA Name1 NA NA NA
NA text1 1 1 3
NA text 2 NA 1 2
NA text3 3 2 1

etc.

I tried:
split(x, x$A)
split(x,x$A == 'split')
split(x,!is.na(x$A))

But nothing produces what I need.
Tanks a lot for any hint!

Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com

Dimitris Rizopoulos

Sun, Sep 6, 2009 2:43 AM #

one way is the following:

ind <- rle(is.na(x$A))
ind <- rep(seq_along(ind$lengths), ind$lengths)
na.ind <- is.na(x$A)
split(x[na.ind, -1], ind[na.ind])


I hope it helps.

Best,
Dimitris

Dimitri Liakhovitski wrote:

Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

Dimitri Liakhovitski

Sun, Sep 6, 2009 5:43 AM #

Thanks a lot, Dimitris.
It totally works on my example data frame.
I know, it's probably hard to address, but when I try to apply it to
the real huge data frame I have, after the last line I get:
Error in `[.default`(x$A, na.ind, -1) :  incorrect number of dimensions.
I know it's impossible to answer this question without seeing the
data, but still: what do you think might be wrong?

Do you think it could be because my first column contains something
else but the "split"? No, I've just run the table on A and it is:
split <NA>
204 6356

I also checked the first dimension of x and the length(na.ind) - the
are the same length: 6560.

No idea where the error might lye...


Thanks a lot!
Dimitri

On Sun, Sep 6, 2009 at 5:43 AM, Dimitris

Rizopoulos<d.rizopoulos at erasmusmc.nl> wrote:

Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com

Dimitri Liakhovitski

Sun, Sep 6, 2009 6:49 PM #

Found a mistake - it was mine!
Thanks a lot for your help!

On Sun, Sep 6, 2009 at 8:43 AM, Dimitri Liakhovitski<ld7631 at gmail.com> wrote:

Thanks a lot, Dimitris.
It totally works on my example data frame.
I know, it's probably hard to address, but when I try to apply it to
the real huge data frame I have, after the last line I get:
Error in `[.default`(x$A, na.ind, -1) : ?incorrect number of dimensions.
I know it's impossible to answer this question without seeing the
data, but still: what do you think might be wrong?

Do you think it could be because my first column contains something
else but the "split"? No, I've just run the table on A and it is:
split <NA>
204 6356

I also checked the first dimension of x and the length(na.ind) - the
are the same length: 6560.

No idea where the error might lye...


Thanks a lot!
Dimitri

On Sun, Sep 6, 2009 at 5:43 AM, Dimitris
Rizopoulos<d.rizopoulos at erasmusmc.nl> wrote:

one way is the following:

ind <- rle(is.na(x$A))
ind <- rep(seq_along(ind$lengths), ind$lengths)
na.ind <- is.na(x$A)
split(x[na.ind, -1], ind[na.ind])


I hope it helps.

Best,
Dimitris


Dimitri Liakhovitski wrote:

I am very sorry for such a simple question, but I am struggling with
"split".
I have the following data frame:

x<-data.frame(A=c(NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA),

B=c("Name1","text1","text2","text3",NA,"Name2","text1","text2","text3",NA,"Name3","text1","text2","text3",NA,"Name4","text1","text2","text3"),

C=c(NA,1,NA,3,NA,NA,4,5,6,NA,NA,7,8,9,NA,NA,3,3,3),D=c(NA,1,1,2,NA,NA,5,6,NA,NA,NA,9,8,7,NA,NA,2,2,2),
E=c(NA,3,2,1,NA,NA,6,5,4,NA,NA,7,7,8,NA,NA,1,NA,1))
print(x)

All I want to do is to split x, i.e., to create a list of data frames
that are currently separated by the word "split" in column A. In this
example, it would be 4 data frames, the first of them being:
A B C D E
NA Name1 NA NA NA
NA text1 1 1 3
NA text 2 NA 1 2
NA text3 3 2 1

etc.

I tried:
split(x, x$A)
split(x,x$A == 'split')
split(x,!is.na(x$A))

But nothing produces what I need.
Tanks a lot for any hint!

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014



--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com

Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com