Skip to content

No subject

6 messages · Micha? Bojanowski, Douglas Bates, Roger Bivand +3 more

#
Hello to all

Recently I came across a problem. I have to analyze a large survey 
data - something about 600 columns and 10000 rows (tab-delimited file 
with names in the header). I was able do import the data into an 
object, but there is no more memory left.

Is there a way to import the data column by column? I have to analyze 
the whole data, but only two variables at a time.

thank in advance

Michal Bojanowski

-----------------------------------------------------------------------
P.S. Wejd? w Kontakt! Wygraj Nokie 9110i i rejs do Szwecji! < http://kontakt.wp.pl/konkurs >

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Micha? Bojanowski <bojanr at wp.pl> writes:
You will probably need to do the data manipulation externally.
Two possible solutions are to use a scripting language like python or
perl or to store the data in a relational database like PostgreSQL or
MySQL.  For data of this size I would recommend the relational
database approach.

R has packages to connect to PostgreSQL or to MySQL.

If you want to use python instead the code is fairly easy to write.
Extracting the first two fields (for which the index expression really
is written 0:2, not 0:1 or 1:2 as one might expect), you could use

#!/usr/bin/env python

import string
import fileinput

for line in fileinput.input():
    flds = string.split(line, "\t")
    print string.join(flds[0:2], "\t")



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On 11 Jul 2001, Douglas Bates wrote:

            
Or using awk/gawk, if you prefer, to choose the fields:
+ cols.I.want[1], ", $", cols.I.want[2], "}' tryout.txt", sep="")),
+ header=T)
and pipe() to read on the fly, maybe? Generalising to an arbitrary number
of chosen columns would also be possible.

Roger
#
Douglas Bates <bates at stat.wisc.edu> writes:
We didn't see what OS this came from so it might well be Windows....

There, you have some possibilities of setting up an ODBC connection to
a text file (via Control Panel - slightly cryptic, but I managed to
get it to play at some point). You should be able to access the
table as a database using the RODBC package and that will allow you to
do the selection of cases/variables.
#
On 11 Jul 2001, Douglas Bates wrote:

            
If you are on a unix box, and you have a tab delimited file, 'cut' will
easily cut out fields from the file. To automate it, use a shell program
to produce all the pairs you want. That is a 1980's solution but it should
work just fine.

David Scott

_________________________________________________________________
David Scott     Department of Statistics
                Tamaki Campus
                The University of Auckland, PB 92019
                Auckland        NEW ZEALAND
Phone: +64 9 373 7599 ext 6830     Fax: +64 9 373 7000
Email:  d.scott at Auckland.ac.nz

President, New Zealand Statistical Association

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
4 days later
#
Something like this:

         a <- scan("file.dat",
                   skip = 1,
                   what = list(0,0,0,0,0),
                   flush = T)[c(2,5)]

         a <- cbind(unlist(a[1]), unlist(a[2]))

might do the trick (this does columns 2 and 5 ... change the index
'[c(2,5)]' to get other columns). The option to scan() are 'skip = 1'
drops the first line of the file, 'what' is a list specifying variable
types (I specify 5 numeric columns ... you need to specify up until your
last variable), 'flush' speeds the whole thing up and saves memory by
not reading more of the line than specified in 'what'.

The cbind() just converts the list returned by scan() to a matrix. You
could make a data.frame using:

         a <- as.data.frame(cbind(unlist(a[1]), unlist(a[2])))

I hope that helps.


--
Mark Myatt


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._