Skip to content
Back to formatted view

Raw Message

Message-ID: <200304082201.h38M1BKS027575@hcs.harvard.edu>
Date: 2003-04-08T22:01:11Z
From: Janet Rosenbaum
Subject: use of variable labels
In-Reply-To:  <Pine.A41.4.44.0304081425460.77668-100000@homer18.u.washington.edu> from "Thomas Lumley" at Apr 08, 2003 02:32:48 PM

> In this particular case I don't see why you would want the numbers, but
> the function as.numeric() will extract the underlying numbers from a
> factor.
 
The mean was just an example.  We have a 4000 line program that expects
numbers.  I was hoping that there would be some way of dealing with this
problem on the level of the data.frame.

I'm guessing I'm just going to have to throw out the labels since it's 
not practical to cast as a number every time and I also just noticed 
something strange about having convert.factors=TRUE:  

When I do 
read.dta("filename.dta")
some of the variables which are numbers are read as NA:
     age         educyrs              
      refuse:   0   refuse:   0   
      DK    :   0   DK    :   0 
      NA's  :1068   NA's  :1068   

When I do
read.dta("filename.dta", convert.factors=FALSE)
the variables are again treated like numbers:

      age           educyrs    
 Min.   :18.00   Min.   : 0.00
 1st Qu.:30.00   1st Qu.: 5.00
 Median :41.00   Median : 9.00
 Mean   :43.18   Mean   : 8.65
 3rd Qu.:54.00   3rd Qu.:12.00
 Max.   :88.00   Max.   :40.00
 NA's   :18.00   NA's   :87.00  

I'm guessing that this means that by default -only- the labels are used 
when convert.factors=TRUE, and even variables without labels have to be
cast as numbers.

Anyhow, thanks so much for the help.  
Thanks,

Janet