Dear All, I have a problem with my data. First problem is that my data is really large and R is omitting some columns from my data. Is there any way to read the whole data without omitting. Another problem is that my data have 102k columns and each column have active or inactive molecules. The data is like below Molecul id 129876 1010101110011110011110011100111100110.......... 234532 1010101110011110011110011100111100110.......... 123678 1010101110011110011110011100111100110.......... . . . . (102k values) When i read the data in R. R define my rows as a "Inf" because R read it as a one number. I want them to be seperated like "1 0 1 0" . Is there anyway to do this in R? Many Thanks, Efe -- View this message in context: http://r.789695.n4.nabble.com/R-Large-Dataset-Problem-tp4554469p4554469.html Sent from the R help mailing list archive at Nabble.com.
R Large Dataset Problem
8 messages · efulas, Milan Bouchet-Valat, Alekseiy Beloshitskiy +2 more
Le vendredi 13 avril 2012 ? 04:32 -0700, efulas a ?crit :
Dear All, I have a problem with my data. First problem is that my data is really large and R is omitting some columns from my data. Is there any way to read the whole data without omitting.
How did you import it? Please be precise.
Another problem is that my data have 102k columns and each column have active or inactive molecules. The data is like below Molecul id 129876 1010101110011110011110011100111100110.......... 234532 1010101110011110011110011100111100110.......... 123678 1010101110011110011110011100111100110.......... . . . . (102k values) When i read the data in R. R define my rows as a "Inf" because R read it as a one number. I want them to be seperated like "1 0 1 0" . Is there anyway to do this in R?
See ?read.fwf. If you still have problems loading your data, feel free to ask again on specific issues. Regards
I would perform data pre-processing before loading in R. Best, -Alex
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of efulas [ef_ulas at hotmail.com]
Sent: 13 April 2012 14:32
To: r-help at r-project.org
Subject: [R] R Large Dataset Problem
Sent: 13 April 2012 14:32
To: r-help at r-project.org
Subject: [R] R Large Dataset Problem
Dear All, I have a problem with my data. First problem is that my data is really large and R is omitting some columns from my data. Is there any way to read the whole data without omitting. Another problem is that my data have 102k columns and each column have active or inactive molecules. The data is like below Molecul id 129876 1010101110011110011110011100111100110.......... 234532 1010101110011110011110011100111100110.......... 123678 1010101110011110011110011100111100110.......... . . . . (102k values) When i read the data in R. R define my rows as a "Inf" because R read it as a one number. I want them to be seperated like "1 0 1 0" . Is there anyway to do this in R? Many Thanks, Efe -- View this message in context: http://r.789695.n4.nabble.com/R-Large-Dataset-Problem-tp4554469p4554469.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you very much for your helps guys. Both message help me to run the data in R. However, R is omitting many columns from my data. Am i missing something? Many Thanks -- View this message in context: http://r.789695.n4.nabble.com/R-Large-Dataset-Problem-tp4554469p4554698.html Sent from the R help mailing list archive at Nabble.com.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 13/04/12 14:20, Milan Bouchet-Valat wrote:
Le vendredi 13 avril 2012 ? 04:32 -0700, efulas a ?crit :
Dear All, I have a problem with my data. First problem is that my data is really large and R is omitting some columns from my data. Is there any way to read the whole data without omitting.
How did you import it? Please be precise.
Another problem is that my data have 102k columns and each column have active or inactive molecules. The data is like below Molecul id 129876 1010101110011110011110011100111100110.......... 234532 1010101110011110011110011100111100110.......... 123678 1010101110011110011110011100111100110.......... . . . . (102k values) When i read the data in R. R define my rows as a "Inf" because R read it as a one number. I want them to be seperated like "1 0 1 0" . Is there anyway to do this in R?
See ?read.fwf. If you still have problems loading your data, feel free to ask again on specific issues.
You could also read them in so that the 1 and 0 are in one field and specify that this column is a character, and then use strsplit() to split them up:
x <- "1010101110011110011110011100111100110" strsplit(x, split="")
[[1]] [1] "1" "0" "1" "0" "1" "0" "1" "1" "1" "0" "0" "1" "1" "1" "1" "0" "0" "1" "1" [20] "1" "1" "0" "0" "1" "1" "1" "0" "0" "1" "1" "1" "1" "0" "0" "1" "1" "0" Cheers, Rainer
Regards
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
- -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer at krugs.de Skype: RMkrug -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk+IIHQACgkQoYgNqgF2egoC5gCfb86H8KCMryM3zhvWPm3ejeIr qDcAni5hTezs9rfJGKq0c6fE8pnltpYS =wd+P -----END PGP SIGNATURE-----
Le vendredi 13 avril 2012 ? 05:44 -0700, efulas a ?crit :
Thank you very much for your helps guys. Both message help me to run the data in R. However, R is omitting many columns from my data. Am i missing something?
Please read the posting guide. If you don't provide the code you ran and the resulting objects and messages, we cannot possibly help you. Regards
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120413/79f7ac83/attachment.pl>
I am using the codes below, options(max.print=5.5E5) x=rep(1,1052) b=read.fwf(file="efetez.binary", widths=c(6,x),header=FALSE) and i get " C stack usage is too close to the limit" this error. I want to get my data like ; molecul id v1 v2 v3 ......................................... 19029 1,1,0,1,0,1,0,....................................... 29837 0,1,1,1,1,0,1........................................ . . . However, i cant get it like above because there are no commas between "1000110010". So R define it as a inf. Many Thanks -- View this message in context: http://r.789695.n4.nabble.com/R-Large-Dataset-Problem-tp4554469p4556188.html Sent from the R help mailing list archive at Nabble.com.