read data into R with some constraints - R-help

Thu, Jan 11, 2001 10:19 AM #

Hi,

I have a big data file (over 30,000 records) looks
like this:

100, 20, 46, 70
103,  0, 22, 45
117, -1, 34, 65
120, 15,  0, 25
113,  0,  -1, 32
142, -1, -1, 55
.....

I want to read only those records having positive
values in all of the  four 
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:

read.csv("data.csv", sep=",")  -> rawdata

it reads the whole thing into R including those
records I don't want.
Could anyone tell me how I can read only those records
I want?

Thanks,
Yu-Ling Wu




__________________________________________________
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Thu, Jan 11, 2001 10:45 AM #

On Thu, 11 Jan 2001, Yu-Ling Wu wrote:

Um, read.csv uses sep =",", and you need header=FALSE.

You can't!  Until you have read the record, you cannot tell if all the
entries are positive.

Is this really a problem?  You only have around 120k numbers, and I just
did it very easily.

rawdata <- read.csv("data.csv", header=F)

Perhaps better is to use a matrix and scan():

rawdata <- matrix(scan("data.csv", sep=","), , 4, byrow=TRUE)
keep <- (rawdata <= 0) %*% rep(1,4) == 0
rawdata[keep, ]

Takes a few seconds and a few Mb.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Pierre Kleiber

Thu, Jan 11, 2001 11:18 AM #

You can filter the data after reading as follows:

Cheers, Pierre

Yu-Ling Wu wrote:

__________________________________________________
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-----------------------------------------------------------------
Pierre Kleiber             Email: pkleiber at honlab.nmfs.hawaii.edu
Fishery Biologist                     Tel: 808 983-5399/737-7544
NOAA FISHERIES - Honolulu Laboratory         Fax: 808 983-2902
2570 Dole St., Honolulu, HI 96822-2396 
-----------------------------------------------------------------
 "God could have told Moses about galaxies and mitochondria and
  all.  But behold... It was good enough for government work."
-----------------------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Thu, Jan 11, 2001 1:13 PM #

On Thu, 11 Jan 2001, Yves Gauvreau wrote:

Yes, it could.

Don't think so.

Had the request been 100x larger, I would have suggested that.

(I think not because RODBC is just an interface that call upon the existing
drivers on the system.)


Regards

Yves Gauvreau
B.E.F.P. Universite du Quebec a Montreal
cyg at sympatico.ca

-----Message d'origine-----
De : owner-r-help at stat.math.ethz.ch
[mailto:owner-r-help at stat.math.ethz.ch]De la part de Prof Brian D Ripley
Envoye : Thursday, January 11, 2001 1:45 PM
A : Yu-Ling Wu
Cc : R-help at stat.math.ethz.ch
Objet : Re: [R] read data into R with some constraints


On Thu, 11 Jan 2001, Yu-Ling Wu wrote:

Hi,

I have a big data file (over 30,000 records) looks
like this:

100, 20, 46, 70
103,  0, 22, 45
117, -1, 34, 65
120, 15,  0, 25
113,  0,  -1, 32
142, -1, -1, 55
.....

I want to read only those records having positive
values in all of the  four
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:

read.csv("data.csv", sep=",")  -> rawdata

Um, read.csv uses sep =",", and you need header=FALSE.

it reads the whole thing into R including those
records I don't want.
Could anyone tell me how I can read only those records
I want?

You can't!  Until you have read the record, you cannot tell if all the
entries are positive.

Is this really a problem?  You only have around 120k numbers, and I just
did it very easily.

rawdata <- read.csv("data.csv", header=F)

Perhaps better is to use a matrix and scan():

rawdata <- matrix(scan("data.csv", sep=","), , 4, byrow=TRUE)
keep <- (rawdata <= 0) %*% rep(1,4) == 0
rawdata[keep, ]

Takes a few seconds and a few Mb.

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-.-.-.-
r-help mailing list -- Read

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Yves Gauvreau

Thu, Jan 11, 2001 1:16 PM #

Hi Brian

I'm not sure about this but could this kind of selective records reading be
done (at least under Windoze) using RODBC since there is a driver for ASCII
file sources?

Assuming the answer is Yes. I would then ask if RODBC could also be used to
do the same for a file residing on a Linux (Ext2fs) drive or any other file
system for that matter?

(I think not because RODBC is just an interface that call upon the existing
drivers on the system.)


Regards

Yves Gauvreau
B.E.F.P. Universite du Quebec a Montreal
cyg at sympatico.ca

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Jason Turner

Thu, Jan 11, 2001 3:57 PM #

Not an arbitrary filesystem, but if it's on a Linux box, and
it's really a big file, you might want to:

1) export the file to the Windows box via Samba.
2) apply the Windows ODBC text driver.
3) use RODBC to read the file.

Far too involved for a file this size, unless this is part
of a dynamic data / updating regularly problem.  In which case,
I'd suggest using a real database (Oracle, PgSQL, etc) and an
ODBC driver.

I say again: the above steps are really too convoluted for
everyday use, particularly if you don't have Samba set up 
already.  

Cheers,

Jason

Indigo Industrial Controls Ltd.
64-21-343-545
jasont at indigoindustrial.co.nz
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._