Skip to content

read data into R with some constraints

6 messages · Yu-Ling Wu, Brian Ripley, Pierre Kleiber +2 more

#
Hi,

I have a big data file (over 30,000 records) looks
like this:

100, 20, 46, 70
103,  0, 22, 45
117, -1, 34, 65
120, 15,  0, 25
113,  0,  -1, 32
142, -1, -1, 55
.....

I want to read only those records having positive
values in all of the  four 
columns. That is, I don't want to read record # 3, 5,
and 6 into R. However,
when I type:

read.csv("data.csv", sep=",")  -> rawdata

it reads the whole thing into R including those
records I don't want.
Could anyone tell me how I can read only those records
I want?

Thanks,
Yu-Ling Wu




__________________________________________________
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 11 Jan 2001, Yu-Ling Wu wrote:

            
Um, read.csv uses sep =",", and you need header=FALSE.
You can't!  Until you have read the record, you cannot tell if all the
entries are positive.

Is this really a problem?  You only have around 120k numbers, and I just
did it very easily.

rawdata <- read.csv("data.csv", header=F)

Perhaps better is to use a matrix and scan():

rawdata <- matrix(scan("data.csv", sep=","), , 4, byrow=TRUE)
keep <- (rawdata <= 0) %*% rep(1,4) == 0
rawdata[keep, ]

Takes a few seconds and a few Mb.
#
You can filter the data after reading as follows:
Cheers, Pierre
Yu-Ling Wu wrote:

  
    
#
On Thu, 11 Jan 2001, Yves Gauvreau wrote:

            
Yes, it could.
Don't think so.

Had the request been 100x larger, I would have suggested that.

  
    
#
Hi Brian

I'm not sure about this but could this kind of selective records reading be
done (at least under Windoze) using RODBC since there is a driver for ASCII
file sources?

Assuming the answer is Yes. I would then ask if RODBC could also be used to
do the same for a file residing on a Linux (Ext2fs) drive or any other file
system for that matter?

(I think not because RODBC is just an interface that call upon the existing
drivers on the system.)


Regards

Yves Gauvreau
B.E.F.P. Universite du Quebec a Montreal
cyg at sympatico.ca
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Not an arbitrary filesystem, but if it's on a Linux box, and
it's really a big file, you might want to:

1) export the file to the Windows box via Samba.
2) apply the Windows ODBC text driver.
3) use RODBC to read the file.

Far too involved for a file this size, unless this is part
of a dynamic data / updating regularly problem.  In which case,
I'd suggest using a real database (Oracle, PgSQL, etc) and an
ODBC driver.

I say again: the above steps are really too convoluted for
everyday use, particularly if you don't have Samba set up 
already.  

Cheers,

Jason