Skip to content

when to use & pros/cons of dataframe vs. matrix?

4 messages · Anika Masters, Don McKenzie, arun +1 more

#
When "should" I use a dataframe vs. a matrix?  What are the pros and cons?
If I have data of all the same type, am I usually better off using a
matrix and not a dataframe?
What are the advantages if any of using a dataframe vs. a matrix?
(rownames and column names perhaps?)
#
Anika -- these are good questions and many on the list could  
expatiate on them.  These erudite people are also busy, however, and  
that is why the R-news posting guide suggests that you study an  
introductory book on R before asking general questions.
On 27-Jun-13, at 11:26 AM, Anika Masters wrote:

            
Don McKenzie, Research Ecologist
Pacific Wildland Fire Sciences Lab
US Forest Service
phone: 206-732-7824

Affiliate Professor
School of Environmental and Forest Sciences
University of Washington
#
Hi,
set.seed(24)
dat1<-data.frame(X=sample(letters,20,replace=TRUE),Y=sample(1:40,20,replace=TRUE),stringsAsFactors=FALSE)
mat1<-as.matrix(dat1)
?sapply(dat1,class)
#????????? X?????????? Y 
#"character"?? "integer" 

sapply(split(mat1,col(mat1)),class)
#????????? 1?????????? 2 
#"character" "character" 
str(as.data.frame(mat1))
#'data.frame':??? 20 obs. of? 2 variables:
# $ X: Factor w/ 14 levels "b","d","f","g",..: 5 3 11 8 10 14 5 12 13 4 ...
# $ Y: Factor w/ 14 levels "10","13","15",..: 12 5 9 13 14 8 12 6 7 4 ...

If you have data of the same type, matrix would be faster when compared to data.frame.
set.seed(245)
mat2<- matrix(sample(1:50,3*1e7,replace=TRUE),ncol=3)

dat2<- as.data.frame(mat2)
system.time(res1<- rowSums(mat2))
#?? user? system elapsed 
#? 0.132?? 0.016?? 0.201 
?system.time(res2<- rowSums(dat2))
#?? user? system elapsed 
#? 0.376?? 0.056?? 0.447 
?identical(res1,res2)
#[1] TRUE


A.K.



----- Original Message -----
From: Anika Masters <anika.masters at gmail.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Thursday, June 27, 2013 2:26 PM
Subject: [R] when to use & pros/cons of dataframe vs. matrix?

When "should" I use a dataframe vs. a matrix?? What are the pros and cons?
If I have data of all the same type, am I usually better off using a
matrix and not a dataframe?
What are the advantages if any of using a dataframe vs. a matrix?
(rownames and column names perhaps?)

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hello,

Arun's answer shows that matrices are faster. If your data is all of the 
same type, then this might be a point for matrices.
data.frames are better for modeling. You can use the formula interface 
to the many modeling functions. For instance, the example below is _not_ 
possible with a matrix.


set.seed(1234)
dat <- data.frame(x = rnorm(100), A = sample(letters[1:4], 100, TRUE), y 
= rnorm(100))

model <- lm(y ~ x + A, data = dat) # not possible with matrix

#predict.lm needs data.frames
newdat <- data.frame(x = c(1,3,4), A = rep("a",3))
predict(model, new = newdat)


There are many other examples like this one. If you are doing data 
modeling, use data frames.

Hope this helps,

Rui Barradas

Em 27-06-2013 19:26, Anika Masters escreveu: