Selective column loads with scan() - R-help

Sun, Jan 7, 2001 10:02 AM #

Help!

I have a very large data file with about 100 columns. Instead of loading 
all of the columns using read.table() (my PC won't be able to handle the
resulting data frame), I'd like to read in one column of the datafile at 
a time using scan() in conjunction with the 'what = ' option. I've tried 
loading just the third column using -

along with a couple of other options but I can't get the results I want 
(if I avoid an error). For example -

[[1]]
NULL

[[2]]
NULL

[[3]]
[1] ""

[[4]]
NULL

is not what i'm looking for. 

Is there a better way to selectively read in individual columns from a tab
separated data file?

I've done this before, and that makes this exercise even more frustrating.

Thanks in advance. 

Samir Mishra
===========================================
Please send replies to :
sqmishra at acm.org
===========================================





_______________________________________________________
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Uwe Ligges

Sun, Jan 7, 2001 10:54 AM #

On Sun, 7 Jan 2001, Samir Mishra wrote:

Try 

  scan(file.name, what = list(, , ""), flush = TRUE)


Uwe Ligges

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Sun, Jan 7, 2001 11:42 AM #

On Sun, 7 Jan 2001, Uwe Ligges wrote:

Unfortunately that gives you the first column.  With S you need the NULLs
in there (and then it is a known trick), but they are not accepted in R.

BTW,

[[1]]:
[1] ""

[[1]]:
NULL

[[2]]:
NULL

[[3]]:
[1] ""

are different.

I don't know a good way to do this in R, but then I would not try it.
Either use `cut' to extract the columns(s) needed from the file, or
use a database and a connection such as RODBC to do the extraction in the
database.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Sun, Jan 7, 2001 3:00 PM #

Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

[ really: scan(file.name, what = list(NULL, NULL, ""), flush = TRUE)

...>

Doesn't look like rocket science to make the NULL entries work in R,
but probably a tad late for 1.2.1 (next Monday)

Basically, I think it would amount to (in src/main/scan.c): 

 * The allocation code in scanFrame should stick in a NULL for ans[i]
   rather than error out if what[i] is NULL. Also watch for this when
   the n == blksize && colsread == 0 condition is handled.

 * fillBuffer should treat NULL as STRSXP

 * extractItem should ignore NULL types (does already, it seems)

On the other hand, it might make better sense to have a mask= argument
controlling which columns that are kept/discarded. Or select=.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._