Reading a specific column of a csv file in a loop

Gabor Grothendieck · 2011-11-08T10:47:26Z

2011/11/8 Sergio Ren? Araujo Enciso : > Dear all: > > I have two larges files with 2000 columns. For each file I am > performing a loop to extract the "i"th element of each file and create > a data frame with both "i"th elements in order to perform further > analysis. I am not extracting all the "i"th elements but only certain > which I am indicating on a vector called "d". > > See ?an example of my ?code below > > ### generate an example for the CSV files, the origin

Gabor Grothendieck

Tue, Nov 8, 2011 2:47 AM

2011/11/8 Sergio Ren? Araujo Enciso <araujo.enciso at gmail.com>:

Its a bit messy if there are row names so lets generate M1.csv like this:

write.csv(M1, file = "M1.csv", row.names = FALSE)

Then we can do this:

nc <- ncol(read.csv("M1.csv", nrows = 1))
colClasses <- replace(rep("NULL", nc), d, NA)
M1.subset <- read.csv("M1.csv", colClasses = colClasses)

or using the same M1.csv that we just generated try this which uses
sqldf with the H2 backend:

library(sqldf)
library(RH2)

M0 <- read.csv("M1.csv", nrows = 1)[0L, ]
M1.subset.h2 <- sqldf(c("insert into M0 (select * from csvread('M1.csv'))",
        "select a, d, g, h from M0"))

This is referred to as Alternative 3 in FAQ#10 Example 6a on the sqldf
home page:
http://sqldf.googlecode.com
Alternative 1 and Alternative 2 listed there could also be tried.

(Note that although sqldf has a read.csv.sql command we did not use it
here since that command only works with the sqlite back end and the
RSQLite driver has a max of 999 columns.)

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Reading a specific column of a csv file in a loop

Thread (6 messages)