understanding how R determines numbers and characters when creating a data frame
Alan Smith wrote:
Hello R Users and Developers, I have a basic question about how R works. Over the past few years I have struggled when I try to generate a new data frame that I believe should contain numeric data in some columns and character data in others only to find everything converted to character data. Is there a general method to create data frames that contain the data in the desired format: numbers as numeric and character as a factor etc? I often have this problem and in the worst case I have to export the file and read it back it in. I have emulated a simple example of the problem. It often happens while using "for" loops. Could someone explain how to avoid this problem by properly creating data frames in for loops that can contain both numeric and character data. ********Question for example 1. Why does the cbind command convert the numeric data to character data? Why can't the character data be converted to numeric data using the fix command?
See ?cbind for a detailed explanation. Anyway, when cbind/rbind is used on vector / matrix it returns matrix. Matrix are necessarily composed of the same type of data (see Introduction to R): combining character and numeric data you are implicitly converting the "short" type (numeric) to the "long" type (character).
### Example 1 #############
data(iris)
obsnum<-NULL
results<-NULL
for(s in unique(as.character(iris$Species))){
temp1<-iris[iris$Species==s,]
obsnum<-length(unique(temp1$Sepal.Length)) # a number
Instead of using cbind here:
out1<-cbind(species=as.character(paste(s)),obsnum) # number converted to character
using data.frame: out1 <- data.frame(species=as.character(paste(s)),obsnum) you are telling R to convert character in factor and to preserve the numeric: c(class(results$species),mode(results$species)) c(class(results$obsnum),mode(results$obsnum)) You can keep the character using the stringsAsFactors argument of the data.frame() function: out1 <- data.frame(species=as.character(paste(s)),obsnum, stringsAsFactors=FALSE) And then: class(results$species) The message is: if you want to mix up different data type you need lists (and data.frame are a special type of list where each component has the same number of elements). Ciao, domenico
results<-rbind(out1,results)
}
results
#fix(results) # cannot convert obsnum to numeric using fix
####################################
******Question for example 2
Why does adding the data.frame command allow the character data to be
converted to numeric data using fix command?
### Example 2 #############
data(iris)
obsnum<-NULL
results<-NULL
for(s in unique(as.character(iris$Species))){
temp1<-iris[iris$Species==s,]
obsnum<-length(unique(temp1$Sepal.Length))
out1<-data.frame(cbind(species=as.character(paste(s)),obsnum)) # number
converted to character
results<-rbind(out1,results)
}
results
#fix(results) # can now convert obsnum to numeric using fix
######
Thank you,
Alan Smith
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.