dummy variable encoding

Richard Cotton · 2009-03-06T10:05:27Z

> > The best encoding depends upon which language you would like to manipulate > > the variable in. In R, genders are most naturally represented as factors. > > That means that in an external data source (like a spreadsheet of data), > > you should ideally have the gender recorded as human-understandable text > > ("male" and "female", or "M" and "F"). Once the data is read into R, by > > default R will convert the string to factors (keeping the human readable > > labels). This wa

Richard Cotton

Fri, Mar 6, 2009 2:05 AM

manipulate

factors.

data),

text

by

readable

have

know

The only benefit that I can see of using 1/2 instead of 0/1 is fairly 
minor.

If you have cases where there are missing values, and you are working in a 
language that doesn't support NA values for integers (or factors; I'm 
thinking of something like C), then you could encode your genders as

0: not recorded
1: female
2: male

Then you can include logic like

if(gender)
{ 
   do something
}

The alternative encoding of 0/1, would be something like

-1: not recorded
0: female
1: male

This makes the code slightly less pretty.

if(gender != -1)
{ 
   do something
}

Again, none of this really applies to R, since you should be using factors 
for this sort of variable.

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

dummy variable encoding

Thread (4 messages)