Skip to content

howto get the number of columns and column names of multiply data frames

4 messages · Frank Schäffer, Steve Lianoglou, Don MacQueen

#
Hi,
I' ve read in several files with measurements into R data frames(works 
flawlessly). Each dataframe is named by the location of measurement and 
contains hundreds of rows and about 50 columns like this

dataframe1.
date measurment_1  .... mesurement_n
1
2
3
..
..
..
n

For further processing I need to check whether or not ncol and colnames are 
the same for all dataframes. 
Also I need to add a new column to each dataframe with contain the name of the 
dataframe, so that this column can be treated as factor in later processing 
(after merging some seleted dataframes to one)

I tried out 

for (i in 1:length(ls()){
	print(ncol(ls()[i])
}

but this does not work because r returns a "character" for i and therefore 
"NULL" as result.
Reading the output of ls() into a list also does not work.

How can I accomplish this task??

Best regards and thanks

Frank
#
Hi,
On Aug 9, 2009, at 11:29 AM, Frank Sch?ffer wrote:

            
Just as an aside, it's somehow considered more R-idiomatic to store  
all of these tables in a list (of tables) and access them as  
mydata[[1]], mydata[[2]], ..., mydata[[n]]. Assuming the datafiles are  
'filename.1.txt', 'filename.2.txt', etc. You might do this like so:

mydata <- lapply(paste('filename', 1:n, 'txt', sep='.'), read.table,  
header=TRUE, sep=...)

To test that all colnames are the same, you could do something like.

names1 <- colnames(mydata[[1]])
all(sapply(2:n, function(dat) length(intersect(names1,  
colnames(mydata[[n]]))) == length(names1)))
If you still want to do it this way, see: ?get

for example:

for (varName in paste('dataframe', 1:n, sep='')) {
   cat(colnames(get(varName)))
}

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
#
## You can use get()

for ( i in 1:n) {
   nm <- paste('dataframe',i,sep='')
   cat( ncol( get(nm)), 'columns in',nm,'\n') )
}


## or
nms <- ls(pattern='dataframe')
   for (nm in nms) cat( ncol(get(nm)) , 'columns in',nm,'\n') )
}

(Assuming I have balanced parantheses, that is -- 
my email software doesn't check that like Emacs 
does!)

Storing the dataframes as elements of a list, as 
Steve Lianoglou suggested, lets you avoid using 
the get() function.

You could also use the count.fields() function to 
check whether the files have the correct number 
of columns even before you  read the data it. Or 
make a pass through the files reading in only the 
first line as data, and comparing those as data 
rather than as a names attribute of a dataframe.

-Don
At 5:29 PM +0200 8/9/09, Frank Sch?ffer wrote: