Back to formatted view
Raw Message

Message-ID: <87ab7z4pcs.fsf@it.com>
Date: 2009-03-05T21:44:03Z
From: news at aleblanc.cotse.net
Subject: dummy variable encoding

Richard.Cotton at hsl.gov.uk writes:

> The best encoding depends upon which language you would like to manipulate 
> the variable in.  In R, genders are most naturally represented as factors. 
>  That means that in an external data source (like a spreadsheet of data), 
> you should ideally have the gender recorded as human-understandable text 
> ("male" and "female", or "M" and "F").  Once the data is read into R, by 
> default R will convert the string to factors (keeping the human readable 
> labels).  This way you avoid having to remember that 1 means male (or 
> whatever).
>
> If you were manipulating the data in a different language that didn't have 
> factors, then it might be more appropriate to use an integer.  Which 
> integers you use doesn't matter, you need to have a look-up table to know 
> what each number refers to, whatever you choose.
>
Yes, that's what I thought. However somebody told me that it is better
to use 1/2 rather than 0/1 for a 2 level factor such as gender, and I've
no idea why. I told them it didn't matter, but have since seen quite a
few examples where they use 1/2 (admittedly in SPSS).

-- 
aleblanc