Hi,
I have a big data file (over 30,000 records) looks
like this:
100, 20, 46, 70
103, 0, 22, 45
117, -1, 34, 65
120, 15, 0, 25
113, 0, -1, 32
142, -1, -1, 55
.....
I want to read only those records having positive
values in all of the four
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:
read.csv("data.csv", sep=",") -> rawdata
it reads the whole thing into R including those
records I don't want.
Could anyone tell me how I can read only those records
I want?
Thanks,
Yu-Ling Wu
__________________________________________________
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
read data into R with some constraints
6 messages · Yu-Ling Wu, Brian Ripley, Pierre Kleiber +2 more
On Thu, 11 Jan 2001, Yu-Ling Wu wrote:
Hi,
I have a big data file (over 30,000 records) looks
like this:
100, 20, 46, 70
103, 0, 22, 45
117, -1, 34, 65
120, 15, 0, 25
113, 0, -1, 32
142, -1, -1, 55
.....
I want to read only those records having positive
values in all of the four
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:
read.csv("data.csv", sep=",") -> rawdata
Um, read.csv uses sep =",", and you need header=FALSE.
it reads the whole thing into R including those records I don't want. Could anyone tell me how I can read only those records I want?
You can't! Until you have read the record, you cannot tell if all the
entries are positive.
Is this really a problem? You only have around 120k numbers, and I just
did it very easily.
rawdata <- read.csv("data.csv", header=F)
Perhaps better is to use a matrix and scan():
rawdata <- matrix(scan("data.csv", sep=","), , 4, byrow=TRUE)
keep <- (rawdata <= 0) %*% rep(1,4) == 0
rawdata[keep, ]
Takes a few seconds and a few Mb.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
You can filter the data after reading as follows:
rawdata <- read.csv("data.csv", sep=",",header=FALSE)
rawdata <- rawdata[apply(rawdata,1,function(x)all(x>=0)),]
Cheers, Pierre
Yu-Ling Wu wrote:
Hi,
I have a big data file (over 30,000 records) looks
like this:
100, 20, 46, 70
103, 0, 22, 45
117, -1, 34, 65
120, 15, 0, 25
113, 0, -1, 32
142, -1, -1, 55
.....
I want to read only those records having positive
values in all of the four
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:
read.csv("data.csv", sep=",") -> rawdata
it reads the whole thing into R including those
records I don't want.
Could anyone tell me how I can read only those records
I want?
Thanks,
Yu-Ling Wu
__________________________________________________ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
----------------------------------------------------------------- Pierre Kleiber Email: pkleiber at honlab.nmfs.hawaii.edu Fishery Biologist Tel: 808 983-5399/737-7544 NOAA FISHERIES - Honolulu Laboratory Fax: 808 983-2902 2570 Dole St., Honolulu, HI 96822-2396 ----------------------------------------------------------------- "God could have told Moses about galaxies and mitochondria and all. But behold... It was good enough for government work." ----------------------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 11 Jan 2001, Yves Gauvreau wrote:
Hi Brian I'm not sure about this but could this kind of selective records reading be done (at least under Windoze) using RODBC since there is a driver for ASCII file sources?
Yes, it could.
Assuming the answer is Yes. I would then ask if RODBC could also be used to do the same for a file residing on a Linux (Ext2fs) drive or any other file system for that matter?
Don't think so. Had the request been 100x larger, I would have suggested that.
(I think not because RODBC is just an interface that call upon the existing drivers on the system.) Regards Yves Gauvreau B.E.F.P. Universite du Quebec a Montreal cyg at sympatico.ca
-----Message d'origine----- De : owner-r-help at stat.math.ethz.ch [mailto:owner-r-help at stat.math.ethz.ch]De la part de Prof Brian D Ripley Envoye : Thursday, January 11, 2001 1:45 PM A : Yu-Ling Wu Cc : R-help at stat.math.ethz.ch Objet : Re: [R] read data into R with some constraints On Thu, 11 Jan 2001, Yu-Ling Wu wrote:
Hi,
I have a big data file (over 30,000 records) looks
like this:
100, 20, 46, 70
103, 0, 22, 45
117, -1, 34, 65
120, 15, 0, 25
113, 0, -1, 32
142, -1, -1, 55
.....
I want to read only those records having positive
values in all of the four
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:
read.csv("data.csv", sep=",") -> rawdata
Um, read.csv uses sep =",", and you need header=FALSE.
it reads the whole thing into R including those records I don't want. Could anyone tell me how I can read only those records I want?
You can't! Until you have read the record, you cannot tell if all the
entries are positive.
Is this really a problem? You only have around 120k numbers, and I just
did it very easily.
rawdata <- read.csv("data.csv", header=F)
Perhaps better is to use a matrix and scan():
rawdata <- matrix(scan("data.csv", sep=","), , 4, byrow=TRUE)
keep <- (rawdata <= 0) %*% rep(1,4) == 0
rawdata[keep, ]
Takes a few seconds and a few Mb.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-.-.-.-
r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi Brian I'm not sure about this but could this kind of selective records reading be done (at least under Windoze) using RODBC since there is a driver for ASCII file sources? Assuming the answer is Yes. I would then ask if RODBC could also be used to do the same for a file residing on a Linux (Ext2fs) drive or any other file system for that matter? (I think not because RODBC is just an interface that call upon the existing drivers on the system.) Regards Yves Gauvreau B.E.F.P. Universite du Quebec a Montreal cyg at sympatico.ca
-----Message d'origine----- De : owner-r-help at stat.math.ethz.ch [mailto:owner-r-help at stat.math.ethz.ch]De la part de Prof Brian D Ripley Envoye : Thursday, January 11, 2001 1:45 PM A : Yu-Ling Wu Cc : R-help at stat.math.ethz.ch Objet : Re: [R] read data into R with some constraints On Thu, 11 Jan 2001, Yu-Ling Wu wrote:
Hi,
I have a big data file (over 30,000 records) looks
like this:
100, 20, 46, 70
103, 0, 22, 45
117, -1, 34, 65
120, 15, 0, 25
113, 0, -1, 32
142, -1, -1, 55
.....
I want to read only those records having positive
values in all of the four
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:
read.csv("data.csv", sep=",") -> rawdata
Um, read.csv uses sep =",", and you need header=FALSE.
it reads the whole thing into R including those records I don't want. Could anyone tell me how I can read only those records I want?
You can't! Until you have read the record, you cannot tell if all the
entries are positive.
Is this really a problem? You only have around 120k numbers, and I just
did it very easily.
rawdata <- read.csv("data.csv", header=F)
Perhaps better is to use a matrix and scan():
rawdata <- matrix(scan("data.csv", sep=","), , 4, byrow=TRUE)
keep <- (rawdata <= 0) %*% rep(1,4) == 0
rawdata[keep, ]
Takes a few seconds and a few Mb.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-.-.-.-
r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Assuming the answer is Yes. I would then ask if RODBC could also be used to do the same for a file residing on a Linux (Ext2fs) drive or any other file system for that matter?
Not an arbitrary filesystem, but if it's on a Linux box, and it's really a big file, you might want to: 1) export the file to the Windows box via Samba. 2) apply the Windows ODBC text driver. 3) use RODBC to read the file. Far too involved for a file this size, unless this is part of a dynamic data / updating regularly problem. In which case, I'd suggest using a real database (Oracle, PgSQL, etc) and an ODBC driver. I say again: the above steps are really too convoluted for everyday use, particularly if you don't have Samba set up already. Cheers, Jason
Indigo Industrial Controls Ltd. 64-21-343-545 jasont at indigoindustrial.co.nz -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._