Skip to content

Selective column loads with scan()

4 messages · Samir Mishra, Uwe Ligges, Brian Ripley +1 more

#
Help!

I have a very large data file with about 100 columns. Instead of loading 
all of the columns using read.table() (my PC won't be able to handle the
resulting data frame), I'd like to read in one column of the datafile at 
a time using scan() in conjunction with the 'what = ' option. I've tried 
loading just the third column using -
along with a couple of other options but I can't get the results I want 
(if I avoid an error). For example -
[[1]]
NULL

[[2]]
NULL

[[3]]
[1] ""

[[4]]
NULL

is not what i'm looking for. 

Is there a better way to selectively read in individual columns from a tab
separated data file?

I've done this before, and that makes this exercise even more frustrating.

Thanks in advance. 

Samir Mishra
===========================================
Please send replies to :
sqmishra at acm.org
===========================================





_______________________________________________________
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Sun, 7 Jan 2001, Samir Mishra wrote:

            
Try 

  scan(file.name, what = list(, , ""), flush = TRUE)


Uwe Ligges

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Sun, 7 Jan 2001, Uwe Ligges wrote:

            
Unfortunately that gives you the first column.  With S you need the NULLs
in there (and then it is a known trick), but they are not accepted in R.

BTW,
[[1]]:
[1] ""
[[1]]:
NULL

[[2]]:
NULL

[[3]]:
[1] ""

are different.

I don't know a good way to do this in R, but then I would not try it.
Either use `cut' to extract the columns(s) needed from the file, or
use a database and a connection such as RODBC to do the extraction in the
database.
#
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
[ really: scan(file.name, what = list(NULL, NULL, ""), flush = TRUE)
...>
Doesn't look like rocket science to make the NULL entries work in R,
but probably a tad late for 1.2.1 (next Monday)

Basically, I think it would amount to (in src/main/scan.c): 

 * The allocation code in scanFrame should stick in a NULL for ans[i]
   rather than error out if what[i] is NULL. Also watch for this when
   the n == blksize && colsread == 0 condition is handled.

 * fillBuffer should treat NULL as STRSXP

 * extractItem should ignore NULL types (does already, it seems)

On the other hand, it might make better sense to have a mask= argument
controlling which columns that are kept/discarded. Or select=.