I am very sorry for such a simple question, but I am struggling with "split".
I have the following data frame:
x<-data.frame(A=c(NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA),
B=c("Name1","text1","text2","text3",NA,"Name2","text1","text2","text3",NA,"Name3","text1","text2","text3",NA,"Name4","text1","text2","text3"),
C=c(NA,1,NA,3,NA,NA,4,5,6,NA,NA,7,8,9,NA,NA,3,3,3),D=c(NA,1,1,2,NA,NA,5,6,NA,NA,NA,9,8,7,NA,NA,2,2,2),
E=c(NA,3,2,1,NA,NA,6,5,4,NA,NA,7,7,8,NA,NA,1,NA,1))
print(x)
All I want to do is to split x, i.e., to create a list of data frames
that are currently separated by the word "split" in column A. In this
example, it would be 4 data frames, the first of them being:
A B C D E
NA Name1 NA NA NA
NA text1 1 1 3
NA text 2 NA 1 2
NA text3 3 2 1
etc.
I tried:
split(x, x$A)
split(x,x$A == 'split')
split(x,!is.na(x$A))
But nothing produces what I need.
Tanks a lot for any hint!
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
one way is the following:
ind <- rle(is.na(x$A))
ind <- rep(seq_along(ind$lengths), ind$lengths)
na.ind <- is.na(x$A)
split(x[na.ind, -1], ind[na.ind])
I hope it helps.
Best,
Dimitris
Dimitri Liakhovitski wrote:
I am very sorry for such a simple question, but I am struggling with "split".
I have the following data frame:
x<-data.frame(A=c(NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA),
B=c("Name1","text1","text2","text3",NA,"Name2","text1","text2","text3",NA,"Name3","text1","text2","text3",NA,"Name4","text1","text2","text3"),
C=c(NA,1,NA,3,NA,NA,4,5,6,NA,NA,7,8,9,NA,NA,3,3,3),D=c(NA,1,1,2,NA,NA,5,6,NA,NA,NA,9,8,7,NA,NA,2,2,2),
E=c(NA,3,2,1,NA,NA,6,5,4,NA,NA,7,7,8,NA,NA,1,NA,1))
print(x)
All I want to do is to split x, i.e., to create a list of data frames
that are currently separated by the word "split" in column A. In this
example, it would be 4 data frames, the first of them being:
A B C D E
NA Name1 NA NA NA
NA text1 1 1 3
NA text 2 NA 1 2
NA text3 3 2 1
etc.
I tried:
split(x, x$A)
split(x,x$A == 'split')
split(x,!is.na(x$A))
But nothing produces what I need.
Tanks a lot for any hint!
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Thanks a lot, Dimitris.
It totally works on my example data frame.
I know, it's probably hard to address, but when I try to apply it to
the real huge data frame I have, after the last line I get:
Error in `[.default`(x$A, na.ind, -1) : incorrect number of dimensions.
I know it's impossible to answer this question without seeing the
data, but still: what do you think might be wrong?
Do you think it could be because my first column contains something
else but the "split"? No, I've just run the table on A and it is:
split <NA>
204 6356
I also checked the first dimension of x and the length(na.ind) - the
are the same length: 6560.
No idea where the error might lye...
Thanks a lot!
Dimitri
On Sun, Sep 6, 2009 at 5:43 AM, Dimitris
Rizopoulos<d.rizopoulos at erasmusmc.nl> wrote:
one way is the following:
ind <- rle(is.na(x$A))
ind <- rep(seq_along(ind$lengths), ind$lengths)
na.ind <- is.na(x$A)
split(x[na.ind, -1], ind[na.ind])
I hope it helps.
Best,
Dimitris
Dimitri Liakhovitski wrote:
I am very sorry for such a simple question, but I am struggling with
"split".
I have the following data frame:
x<-data.frame(A=c(NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA),
B=c("Name1","text1","text2","text3",NA,"Name2","text1","text2","text3",NA,"Name3","text1","text2","text3",NA,"Name4","text1","text2","text3"),
C=c(NA,1,NA,3,NA,NA,4,5,6,NA,NA,7,8,9,NA,NA,3,3,3),D=c(NA,1,1,2,NA,NA,5,6,NA,NA,NA,9,8,7,NA,NA,2,2,2),
E=c(NA,3,2,1,NA,NA,6,5,4,NA,NA,7,7,8,NA,NA,1,NA,1))
print(x)
All I want to do is to split x, i.e., to create a list of data frames
that are currently separated by the word "split" in column A. In this
example, it would be 4 data frames, the first of them being:
A B C D E
NA Name1 NA NA NA
NA text1 1 1 3
NA text 2 NA 1 2
NA text3 3 2 1
etc.
I tried:
split(x, x$A)
split(x,x$A == 'split')
split(x,!is.na(x$A))
But nothing produces what I need.
Tanks a lot for any hint!
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
Found a mistake - it was mine!
Thanks a lot for your help!
On Sun, Sep 6, 2009 at 8:43 AM, Dimitri Liakhovitski<ld7631 at gmail.com> wrote:
Thanks a lot, Dimitris.
It totally works on my example data frame.
I know, it's probably hard to address, but when I try to apply it to
the real huge data frame I have, after the last line I get:
Error in `[.default`(x$A, na.ind, -1) : ?incorrect number of dimensions.
I know it's impossible to answer this question without seeing the
data, but still: what do you think might be wrong?
Do you think it could be because my first column contains something
else but the "split"? No, I've just run the table on A and it is:
split <NA>
204 6356
I also checked the first dimension of x and the length(na.ind) - the
are the same length: 6560.
No idea where the error might lye...
Thanks a lot!
Dimitri
On Sun, Sep 6, 2009 at 5:43 AM, Dimitris
Rizopoulos<d.rizopoulos at erasmusmc.nl> wrote:
one way is the following:
ind <- rle(is.na(x$A))
ind <- rep(seq_along(ind$lengths), ind$lengths)
na.ind <- is.na(x$A)
split(x[na.ind, -1], ind[na.ind])
I hope it helps.
Best,
Dimitris
Dimitri Liakhovitski wrote:
I am very sorry for such a simple question, but I am struggling with
"split".
I have the following data frame:
x<-data.frame(A=c(NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA,"split",NA,NA,NA,NA),
B=c("Name1","text1","text2","text3",NA,"Name2","text1","text2","text3",NA,"Name3","text1","text2","text3",NA,"Name4","text1","text2","text3"),
C=c(NA,1,NA,3,NA,NA,4,5,6,NA,NA,7,8,9,NA,NA,3,3,3),D=c(NA,1,1,2,NA,NA,5,6,NA,NA,NA,9,8,7,NA,NA,2,2,2),
E=c(NA,3,2,1,NA,NA,6,5,4,NA,NA,7,7,8,NA,NA,1,NA,1))
print(x)
All I want to do is to split x, i.e., to create a list of data frames
that are currently separated by the word "split" in column A. In this
example, it would be 4 data frames, the first of them being:
A B C D E
NA Name1 NA NA NA
NA text1 1 1 3
NA text 2 NA 1 2
NA text3 3 2 1
etc.
I tried:
split(x, x$A)
split(x,x$A == 'split')
split(x,!is.na(x$A))
But nothing produces what I need.
Tanks a lot for any hint!
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com