use of variable labels
On Tue, 8 Apr 2003, janet rosenbaum wrote:
The mean was just an example. We have a 4000 line program that expects numbers. I was hoping that there would be some way of dealing with this problem on the level of the data.frame.
as.data.frame(lapply(df,as.numeric)) would work if all your variables were either unlabelled or completely labelled, but it doesn't seem any simpler than using convert.factors=FALSE
I'm guessing I'm just going to have to throw out the labels since it's
not practical to cast as a number every time and I also just noticed
something strange about having convert.factors=TRUE:
When I do
read.dta("filename.dta")
some of the variables which are numbers are read as NA:
age educyrs
refuse: 0 refuse: 0
DK : 0 DK : 0
NA's :1068 NA's :1068
When I do
read.dta("filename.dta", convert.factors=FALSE)
the variables are again treated like numbers:
age educyrs
Min. :18.00 Min. : 0.00
1st Qu.:30.00 1st Qu.: 5.00
Median :41.00 Median : 9.00
Mean :43.18 Mean : 8.65
3rd Qu.:54.00 3rd Qu.:12.00
Max. :88.00 Max. :40.00
NA's :18.00 NA's :87.00
I'm guessing that this means that by default -only- the labels are used
when convert.factors=TRUE, and even variables without labels have to be
cast as numbers.
No, that is not the case. I suspect you have variable labels declared in Stata for these variables, it's just that the variables don't take on those values. read.dta does assume that if any value of a variable has a label then all values should. It doesn't eg handle labels for different types of missing on an otherwise numeric variable. -thomas