Skip to content

R Large Dataset Problem

8 messages · efulas, Milan Bouchet-Valat, Alekseiy Beloshitskiy +2 more

#
Dear All,

 I have a problem with my data. First problem is that my data is really
large and R is omitting some columns from my data. Is there any way to read
the whole data without omitting. Another problem is that my data have 102k
columns and each column have active or inactive molecules. The data is like
below

Molecul id 

129876            1010101110011110011110011100111100110..........
234532            1010101110011110011110011100111100110..........
123678            1010101110011110011110011100111100110..........
.
.
.
.
(102k values)


When i read the data in R. R define my rows as a "Inf" because R read it as
a one number. I want them to be seperated like "1  0   1   0" . Is there
anyway to do this in R?

Many Thanks,


Efe 

--
View this message in context: http://r.789695.n4.nabble.com/R-Large-Dataset-Problem-tp4554469p4554469.html
Sent from the R help mailing list archive at Nabble.com.
#
Le vendredi 13 avril 2012 ? 04:32 -0700, efulas a ?crit :
How did you import it? Please be precise.
See ?read.fwf. If you still have problems loading your data, feel free
to ask again on specific issues.


Regards
#
I would perform data pre-processing before loading in R.


Best,
-Alex
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 13/04/12 14:20, Milan Bouchet-Valat wrote:
You could also read them in so that the 1 and 0 are in one field and specify that this column is a
character, and then use strsplit() to split them up:
[[1]]
 [1] "1" "0" "1" "0" "1" "0" "1" "1" "1" "0" "0" "1" "1" "1" "1" "0" "0" "1" "1"
[20] "1" "1" "0" "0" "1" "1" "1" "0" "0" "1" "1" "1" "1" "0" "0" "1" "1" "0"



Cheers,

Rainer
- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys.
(Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :       +33 - (0)9 53 10 27 44
Cell:       +33 - (0)6 85 62 59 98
Fax :       +33 - (0)9 58 10 27 44

Fax (D):    +49 - (0)3 21 21 25 22 44

email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+IIHQACgkQoYgNqgF2egoC5gCfb86H8KCMryM3zhvWPm3ejeIr
qDcAni5hTezs9rfJGKq0c6fE8pnltpYS
=wd+P
-----END PGP SIGNATURE-----
#
Le vendredi 13 avril 2012 ? 05:44 -0700, efulas a ?crit :
Please read the posting guide. If you don't provide the code you ran and
the resulting objects and messages, we cannot possibly help you.


Regards
#
I am using the codes below,


options(max.print=5.5E5)
x=rep(1,1052)
b=read.fwf(file="efetez.binary", widths=c(6,x),header=FALSE)

and i get " C stack usage is too close to the limit" this error. I want to
get my data like ;

molecul id     v1  v2   v3 .........................................

19029              1,1,0,1,0,1,0,.......................................
29837              0,1,1,1,1,0,1........................................
.
.
.

However, i cant get it like above because there are no commas between
"1000110010". So R define it as a inf. 


Many Thanks

--
View this message in context: http://r.789695.n4.nabble.com/R-Large-Dataset-Problem-tp4554469p4556188.html
Sent from the R help mailing list archive at Nabble.com.