Skip to content

use of variable labels

3 messages · Janet Rosenbaum, Spencer Graves, Thomas Lumley

#
The R documentation for some of the foreign package's functions says 
that the set of variable labels becomes attributes in the resulting
data frame.  

Thus, e.g., 5="strongly agree", 4="agree", etc.

I'm happy that the labels are being passed, but unfortunately, when 
R summarizes the data, it will list it only as categories, and
doesn't deal with the corresponding numbers.  It seems as though 
the numbers attached to the categories don't exist.  

Is there a way to make R go back and forth between the categories and 
the corresponding numbers as Stata does, or do I just have to set
convert.factors=FALSE ?

Hope everyone's enjoying the April snow!
Thanks,

Janet
id             country              code         sex     
 Min.   :10100001   Length:1068        Mexico:604   Female:541  
 1st Qu.:10100306   Mode  :character   China :464   Male  :509  
 Median :14000071                                   NA's  : 18  
 Mean   :12305905                                               
 3rd Qu.:14000339                                               
 Max.   :14000628
[1] 12305905
[1] NA
Warning message: 
argument is not numeric or logical: returning NA in: mean.default(MC$sex) 



Stata gives:

. summ

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
          id |    1068    1.23e+07    1934101   1.01e+07   1.40e+07
     country |       0
        code |    1068    .4344569   .4959177          0          1
         sex |    1050    1.484762   .5000059          1          2
#
> tst.df
   a b
1 a 3
2 b 4
3 c 5
 > as.numeric(tst.df$a)
[1] 1 2 3
 > as.numeric(tst.df$b)
[1] 1 2 3
 > as.character(tst.df$b)
[1] "3" "4" "5"
 > as.numeric(as.character(tst.df$b))
[1] 3 4 5
Does this answer your question?
Spencer Graves
janet rosenbaum wrote:
#
On Tue, 8 Apr 2003, janet rosenbaum wrote:

            
In this particular case I don't see why you would want the numbers, but
the function as.numeric() will extract the underlying numbers from a
factor.

eg
  mean(as.numeric(MC$sex))
or
  mean(as.numeric(MC$code))
should work, but
  mean(MC$sex=="Male")
or
  mean(MC$code=="China")
should also work and seem clearer to me.


	-thomas