Skip to content

adding a dummy variable...

4 messages · grazia at stat.columbia.edu, Martyn Byng, Dennis Murphy +1 more

#
Hi all,

I have a dataset of individuals where the variable ID corresponds to the
identification of the household where the individual lives. rel.head stands
for the relationship with the household head. so rel.head=1 is the household
head, rel.head=2 is the spouse, rel.head=3 is the children.

Here is an example to see how it looks like:

df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103",
                     "17104", "17104", "17104", "17105", "17105"),
  rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3"))


I want to add a dummy variable that is equal to 1 when these conditions
held simultaneously :

a) the number of rows with same ID is equal to 2
b) the variable rel.head=1 and rel.head=3


So my ideal output is:

   ID      rel.head   added.dummy
1  17100        1           1
2  17100        3           1
3  17101        1           0
4  17102        1           0
5  17103        1           0
6  17103        2           0
7  17104        1           0
8  17104        2           0
9  17104        3           0
10 17105        1           1
11 17105        3           1

Is there a simple way to do that?
Can somebody help?

Thanks in advance,
Grazia
#
Hi,

I am sure there are better / more efficient ways of doing this, but the
following seems to work ...

ids <- sapply(split(df,df$ID),function(x) {length(x$rel.head)==2  &
any(x$rel.head==1) & any(x$rel.head==3)})
ids <- as.numeric(names(ids)[ids])
added.dummy <- as.numeric(df$ID%in%ids)
cbind(df,added.dummy)

Martyn

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of grazia at stat.columbia.edu
Sent: 04 October 2011 16:45
To: r-help at r-project.org
Subject: [R] adding a dummy variable...

Hi all,

I have a dataset of individuals where the variable ID corresponds to the
identification of the household where the individual lives. rel.head
stands
for the relationship with the household head. so rel.head=1 is the
household
head, rel.head=2 is the spouse, rel.head=3 is the children.

Here is an example to see how it looks like:

df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103",
"17103",
                     "17104", "17104", "17104", "17105", "17105"),
  rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3"))


I want to add a dummy variable that is equal to 1 when these conditions
held simultaneously :

a) the number of rows with same ID is equal to 2
b) the variable rel.head=1 and rel.head=3


So my ideal output is:

   ID      rel.head   added.dummy
1  17100        1           1
2  17100        3           1
3  17101        1           0
4  17102        1           0
5  17103        1           0
6  17103        2           0
7  17104        1           0
8  17104        2           0
9  17104        3           0
10 17105        1           1
11 17105        3           1

Is there a simple way to do that?
Can somebody help?

Thanks in advance,
Grazia

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________________________________________________
This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}}
#
Hi:

Here's another way to do it with the plyr package, also not terribly
elegant. It assumes that rel.head is a factor in your original data
frame:
'data.frame':   11 obs. of  2 variables:
 $ ID      : Factor w/ 6 levels "17100","17101",..: 1 1 2 3 4 4 5 5 5 6 ...
 $ rel.head: Factor w/ 3 levels "1","2","3": 1 3 1 1 1 2 1 2 3 1 ...

If this is not the case in your data, then you need to modify the
function f below accordingly. (This is why use of dput() is preferred
when sending example data to R-help, BTW.)

library('plyr')
f <- function(d) {
    tvec <- factor(c(1, 3), levels = 1:3)   # target vector
    if(nrow(d) != 2L) {d$dummy <- rep(0, nrow(d)); return(d)}
    # If the first if statement is FALSE, then the following code is run:
       d$dummy <- ifelse(!identical(d[, 2], tvec), 0, 1)
       d
   }

ddply(df, .(ID), f)

      ID rel.head dummy
1  17100        1     1
2  17100        3     1
3  17101        1     0
4  17102        1     0
5  17103        1     0
6  17103        2     0
7  17104        1     0
8  17104        2     0
9  17104        3     0
10 17105        1     1
11 17105        3     1

HTH,
Dennis
On Tue, Oct 4, 2011 at 8:44 AM, <grazia at stat.columbia.edu> wrote:
#
Hi,

Using ddply,

ddply(df, .(ID), mutate, nrows=length(rel.head), test = nrows==2 &
all(rel.head %in% c(1,3)))

HTH,

baptiste
On 5 October 2011 06:02, Dennis Murphy <djmuser at gmail.com> wrote: