large survey data
On 11 Jul 2001, Douglas Bates wrote:
Micha? Bojanowski <bojanr at wp.pl> writes:
Recently I came across a problem. I have to analyze a large survey data - something about 600 columns and 10000 rows (tab-delimited file with names in the header). I was able do import the data into an object, but there is no more memory left. Is there a way to import the data column by column? I have to analyze the whole data, but only two variables at a time.
You will probably need to do the data manipulation externally.
Two possible solutions are to use a scripting language like python or
perl or to store the data in a relational database like PostgreSQL or
MySQL. For data of this size I would recommend the relational
database approach.
R has packages to connect to PostgreSQL or to MySQL.
If you want to use python instead the code is fairly easy to write.
Extracting the first two fields (for which the index expression really
is written 0:2, not 0:1 or 1:2 as one might expect), you could use
#!/usr/bin/env python
import string
import fileinput
for line in fileinput.input():
flds = string.split(line, "\t")
print string.join(flds[0:2], "\t")
Or using awk/gawk, if you prefer, to choose the fields:
xx <- matrix(runif(5000), 100, 50)
col <- character(ncol(xx))
for (i in 1:ncol(xx)) col[i] <- paste("Var", i, sep="")
colnames(xx) <- col
write.table(as.data.frame(xx), "tryout.txt", row.names=F, sep="\t")
cols.I.want <- c(5, 47)
xx.I.want <- read.table(pipe(paste("awk -F\"\t\" 'BEGIN{OFS=\"\t\"}{print $",
+ cols.I.want[1], ", $", cols.I.want[2], "}' tryout.txt", sep="")), + header=T)
summary(xx.I.want[,1] - xx[,cols.I.want[1]])
and pipe() to read on the fly, maybe? Generalising to an arbitrary number of chosen columns would also be possible. Roger
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: Roger.Bivand at nhh.no and: Department of Geography and Regional Development, University of Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._