Skip to content

list to dataframe conversion-testing for identical

6 messages · David L Carlson, Rui Barradas, David Winsemius +1 more

#
HI R help,

I was trying to get identical data frame from a list using two methods.

#Suppose my list is:
listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2))
#Creating dataframe using cbind

dat1<-data.frame(do.call("cbind",listdat1))
colnames(dat1)<-c("Var1","Var2","Var3")
#Second dataframe conversion

dat2<-data.frame(Var1=listdat1[[1]],Var2=listdat1[[2]],Var3=listdat1[[3]])

#Structure is different in two datasets
?>str(dat1)
'data.frame':??? 10 obs. of? 3 variables:
?$ Var1: Factor w/ 10 levels "18.6153321029756",..: 5 2 6 8 7 9 1 4 3 10
?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
?$ Var3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
'data.frame':??? 10 obs. of? 3 variables:
?$ Var1: num? 20.3 19.2 20.5 20.9 20.5 ...
?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
?$ Var3: int? 1 2 3 4 5 1 2 3 4 5

#Converting structure of dat1 to match da2 structure
dat1<-within(dat1,{Var1<-as.numeric(as.character(Var1)) 
??? Var3<-as.integer(Var3)})

head(dat1)
????? Var1 Var2 Var3
1 20.27193??? A??? 1
2 19.17586??? B??? 2
3 20.53197??? A??? 3
4 20.93615??? B??? 4
5 20.53498??? A??? 5
6 21.02044??? B??? 1
????? Var1 Var2 Var3
1 20.27193??? A??? 1
2 19.17586??? B??? 2
3 20.53197??? A??? 3
4 20.93615??? B??? 4
5 20.53498??? A??? 5
6 21.02044??? B??? 1


#New structure?identical(str(dat1),str(dat2))
'data.frame':??? 10 obs. of? 3 variables:
?$ Var1: num? 19.9 19 21.2 20.7 20.4 ...
?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
?$ Var3: int? 1 2 3 4 5 1 2 3 4 5
'data.frame':??? 10 obs. of? 3 variables:
?$ Var1: num? 19.9 19 21.2 20.7 20.4 ...
?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
?$ Var3: int? 1 2 3 4 5 1 2 3 4 5
[1] TRUE



#structure is identical and dataframe looks to be same, but it is not identical.?
[1] FALSE


Is it something to do with the floating point?

Thanks,

A.K.
#
Yes it does have something to do with the representation of floating point
numbers. Using cbind() forces the list to become a matrix and that forces
all of the data to become character strings since one of the list elements
is character:
chr [1:10, 1:3] "21.3709584471467" "19.4353018286039" ...
Then you convert that to a data.frame. The default in data.frame() is to
convert characters to factors so you get
'data.frame':   10 obs. of  3 variables:
 $ X1: Factor w/ 10 levels "19.4353018286039",..: 8 1 5 7 6 2 9 3 10 4
 $ X2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
 $ X3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5

With dat2 you used data.frame() so the numeric fields were not converted to
strings and then factors. Then you converted the dat1 factors back to
numeric. You would be fine with just
Or you can name the list elements and then convert
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
Hello,

But

 > all.equal(dat1,dat2)
[1] TRUE

So I guess it does have to do with floating-point equality, all.equal 
uses .Machine$double.eps. (Which could return FALSE on ocasions we would 
expect TRUE, when, for instance, the tolerance could/should be 
.Machine$double.eps^0.5.)

Rui Barradas

Em 01-07-2012 18:55, arun escreveu:
#
On Jul 1, 2012, at 5:09 PM, David L Carlson wrote:

            
Yes, arun. If the coding had proceeded otherwise a more natural and  
expected result might have occurred:

 > dat1<-do.call("data.frame",listdat1)
 > colnames(dat1)<-c("Var1","Var2","Var3")
 > dat1
        Var1 Var2 Var3
1  21.14076    A    1
2  19.53277    B    2
3  19.59725    A    3
4  19.84262    B    4
5  19.93251    A    5
6  20.92242    B    1
7  19.22315    A    2
8  19.13742    B    3
9  18.82441    A    4
10 20.92661    B    5

Whoever taught you to use 'cbind' for construction of data.frames did  
you a great disservice. It would seem much less problematic to have  
simply done this in the first place:

dat1 <- data.frame(Var1=rnorm(10,20),Var2=rep(LETTERS[1:2], 
5),var3=rep(1:5,2) )
#
HI All,

Thanks for your replies.

A.K.



----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Sunday, July 1, 2012 6:31 PM
Subject: Re: [R] list to dataframe conversion-testing for identical
On Jul 1, 2012, at 5:09 PM, David L Carlson wrote:

            
Yes, arun. If the coding had proceeded otherwise a more natural and expected result might have occurred:
? ? ?  Var1 Var2 Var3
1? 21.14076? ? A? ? 1
2? 19.53277? ? B? ? 2
3? 19.59725? ? A? ? 3
4? 19.84262? ? B? ? 4
5? 19.93251? ? A? ? 5
6? 20.92242? ? B? ? 1
7? 19.22315? ? A? ? 2
8? 19.13742? ? B? ? 3
9? 18.82441? ? A? ? 4
10 20.92661? ? B? ? 5

Whoever taught you to use 'cbind' for construction of data.frames did you a great disservice. It would seem much less problematic to have simply done this in the first place:

dat1 <- data.frame(Var1=rnorm(10,20),Var2=rep(LETTERS[1:2],5),var3=rep(1:5,2) )

--David.
David Winsemius, MD
West Hartford, CT
#
Hi David & Rui,

It must be the floating point representation.
dat1$Var1<-round(dat1$Var1)
?dat2$Var1<-round(dat2$Var1)

identical(dat1,dat2)
[1] TRUE


I knew that "cbind" is not ideal for converting to dataframe.? But, I used it to understand the differences.

Thanks again,

A.K. ? 



----- Original Message -----
From: David L Carlson <dcarlson at tamu.edu>
To: 'arun' <smartpink111 at yahoo.com>; 'R help' <r-help at r-project.org>
Cc: 
Sent: Sunday, July 1, 2012 5:09 PM
Subject: RE: [R] list to dataframe conversion-testing for identical

Yes it does have something to do with the representation of floating point
numbers. Using cbind() forces the list to become a matrix and that forces
all of the data to become character strings since one of the list elements
is character:
chr [1:10, 1:3] "21.3709584471467" "19.4353018286039" ...
Then you convert that to a data.frame. The default in data.frame() is to
convert characters to factors so you get
'data.frame':?  10 obs. of? 3 variables:
$ X1: Factor w/ 10 levels "19.4353018286039",..: 8 1 5 7 6 2 9 3 10 4
$ X2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
$ X3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5

With dat2 you used data.frame() so the numeric fields were not converted to
strings and then factors. Then you converted the dat1 factors back to
numeric. You would be fine with just
Or you can name the list elements and then convert
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352