non-intuitive behaviour after type conversion
When you attach() something, it loads it into memory and there it stays. It is not a link, reference, or pointer to the original. Changing the original (the version in the dataframe), which is what you did, does not change the attached copy in memory. In essence, you did a type conversion on one copy, but afterwards started looking at the other copy. See also an interjected comments below. -Don
At 8:54 AM +0000 11/23/09, Alan Kelly wrote:
Deal list, I have a data frame (birth) with mixed variables (numeric and alphanumeric). One variable "t1stvisit" was originally coded as numeric with values 1,2, and 3. After attaching the data frame, this is what I see when I use str(t1stvisit) $ t1stvisit: int 1 1 1 1 1 1 1 1 2 2 ... This is as expected. I then convert t1stvisit to a factor and to avoid creating a second copy of this variable independent of the data frame I use: birth$t1stvisit = as.factor(birth$t1stvisit) if I check that the conversion has worked: is.factor(t1stvisit) [1] FALSE Now the only object present in the workspace in the data frame "birth" and, as noted, I have not created any new variables. So why does R still treat t1stvisit as numeric? is.factor(t1stvisit) [1] FALSE Yet when I try the following:
is.factor(birth$t1stvisit)
[1] TRUE So, there appears to be two versions of "t1stvisit" - the original numeric version and the correct factor version although ls() only shows "birth" as present in the workspace.
Right.
find('t1stvisit')
will show you there are two of them, and where in memory they are located.
If you type
t1stvisit
at the prompt, you always get the first one. The one in the attached
dataframe is the second one. Use the
search()
function to show you the different locations in memory where objects
can be found.
When you did the attach(), did you get a message like:
attach(tmp)
The following object(s) are masked _by_ .GlobalEnv :
x
(yours would have referred to your variables, not the "x" in my example).
That message tells you you have two variables of the same name,
stored in two different locations in the search path.
As a general rule, it's just plain confusing to have more than one
object of the same name in more than one location. In your situation,
I would get rid of the one that's not in the dataframe. But even
then, if you change it in the dataframe you'll still need to detach
and re-attach the dataframe, so using attach() is probably not the
best choice in the long run. Maybe the with() function would meet
your needs.
If I type:
summary(t1stvisit)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1.000 1.000 2.000 1.574 2.000 3.000 29.000 I get the numeric version, but if I try summary(birth$t1stvisit) 1 2 3 NA's 180 169 22 29 I get the factor version. Frankly I feel that this behaviour is non-intuitive and potentially problematic. Nor have I seen warnings about this in the various text books on R. Can anyone comment on why this should occur? Many thanks, Alan Kelly Dr. Alan Kelly Department of Public Health & Primary Care Trinity College Dublin
______________________________________________ R-help at r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062