Back to formatted view
Raw Message

Message-ID: <Pine.A41.4.44.0304081513560.77668-100000@homer18.u.washington.edu>
Date: 2003-04-08T22:19:06Z
From: Thomas Lumley
Subject: use of variable labels
In-Reply-To: <200304082201.h38M1BKS027575@hcs.harvard.edu>

On Tue, 8 Apr 2003, janet rosenbaum wrote:
>
> The mean was just an example.  We have a 4000 line program that expects
> numbers.  I was hoping that there would be some way of dealing with this
> problem on the level of the data.frame.

as.data.frame(lapply(df,as.numeric))

would work if all your variables were either unlabelled or completely
labelled, but it doesn't seem any simpler than using convert.factors=FALSE

> I'm guessing I'm just going to have to throw out the labels since it's
> not practical to cast as a number every time and I also just noticed
> something strange about having convert.factors=TRUE:
>
> When I do
> read.dta("filename.dta")
> some of the variables which are numbers are read as NA:
>      age         educyrs
>       refuse:   0   refuse:   0
>       DK    :   0   DK    :   0
>       NA's  :1068   NA's  :1068
>
> When I do
> read.dta("filename.dta", convert.factors=FALSE)
> the variables are again treated like numbers:
>
>       age           educyrs
>  Min.   :18.00   Min.   : 0.00
>  1st Qu.:30.00   1st Qu.: 5.00
>  Median :41.00   Median : 9.00
>  Mean   :43.18   Mean   : 8.65
>  3rd Qu.:54.00   3rd Qu.:12.00
>  Max.   :88.00   Max.   :40.00
>  NA's   :18.00   NA's   :87.00
>
> I'm guessing that this means that by default -only- the labels are used
> when convert.factors=TRUE, and even variables without labels have to be
> cast as numbers.

No, that is not the case.  I suspect you have variable labels
declared in Stata for these variables, it's just that the variables don't
take on those values.

read.dta does assume that if any value of a variable has a label then all
values should. It doesn't eg handle labels for different types of missing
on an otherwise numeric variable.

	-thomas