Back to formatted view
Raw Message

Message-ID: <85F897BE-9AE2-483B-BFB0-3AAF030A69CA@me.com>
Date: 2016-10-05T19:28:31Z
From: Marc Schwartz
Subject: Managing person identifier variable
In-Reply-To: <1682825.eChPh1C0BU@equinox2>

> On Oct 5, 2016, at 2:21 PM, Theodore Lytras <thlytras at gmail.com> wrote:
> 
> ???? ???????, 5 ????????? 2016 6:59:30 ?.?. EEST MACDOUGALL Margaret ??????:
>> I would be most grateful for some advice in relation to the interpretation
>> of a person identifier variable (persID, say),  in R. I would like to
>> represent persons, as an independent variable, by a random effect. However,
>> there are over 200 such persons. Each person is allocated a random
>> numerical code as a unique identifier.  Currently, R is reading the
>> identifier variable as a numeric variable. Is there a quick way of
>> addressing this problem by recoding the variable?  (I do not wish to bin
>> the values into category ranges; rather, I wish to avoid the numerical
>> codes being interpreted literally.)
> 
> Just recode it as a factor, i.e. factor(persID).
> 
> By the way, lme4 does that implicitly if you specify a numeric variable as a 
> random effect in a model formula, i.e. you can just say: y ~ x + (1|persID) 
> instead of: y ~ x + (1|factor(persID))


Just a quick pointer here which is that if the persID values contained leading zeros that are a material part of the unique IDs, such as:

  01234
  001234

then coercing to factors, after having been coerced to numeric values, will result in both of the above being 1234:

> factor(as.numeric("01234"))
[1] 1234
Levels: 1234

> factor(as.numeric("001234"))
[1] 1234
Levels: 1234


Food for thought...

Regards,

Marc Schwartz


	[[alternative HTML version deleted]]