Skip to content

How to generate a conditional dummy in R?

5 messages · Faradj Koliev, Bert Gunter, Jim Lemon

#
Hi everyone, 

I am trying to generate a conditional dummy variable ?X" with the following rules

 set X=1 if Y is =1, two years prior to the NA.  [0,0,NA]. 

For example, if  the pattern for Y is 0,0,NA then the X variable is =0 for all  the two years prior to the NA. If the pattern for Y is 0,1,NA or 1,0,NA then the X =1 . To be clear, if 1,1,NA then the X=1 that  first specific year, it should only count once (X=1), not twice. 

The code that I have now is not complete and I would appreciate some advice here. This is the code: 
dat2 <- dat1 %>% 
  group_by(country) %>% 
  group_by(grp = cumsum(is.na(lag(Y))), add = TRUE) %>% 
  mutate(first_year_at_1 = match(1, Y) * any(is.na(Y)) * any(tail(Y, 3) == 1L), 
         X = {x <- integer(length(Y)) ; x[first_year_at_1] <- 1L ; x}) %>% 
  ungroup()

It doesn?t really generate what I described above. Any help here would be much appreciated. 

Below you can see my sample data with the desired outcome ?X? dummy in it.

Thank you!
structure(list(year = c(1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 
1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 
1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 
2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 
2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 
1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 
2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 
1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 
2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 
1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 
1999L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 
2007L, 2008L, 2009L, 2010L, 2011L), country = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Canada", 
"Cuba", "Dominican Republic", "Haiti", "Jamaica"), class = "factor"), 
    Y = c(1L, NA, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, 
    1L, NA, 1L, NA, 1L, NA, 1L, NA, NA, 1L, 1L, NA, NA, 1L, NA, 
    1L, NA, 1L, NA, 1L, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 
    NA, 1L, NA, 1L, 0L, 0L, 0L, 1L, NA, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L, 1L, NA, 0L, 1L, 1L, NA, 0L, 1L, NA, 1L, NA, 1L, NA, 1L, 
    NA, 1L, NA, 1L, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, 
    0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, NA, 0L, 1L, 1L, 1L, 
    NA, 1L, NA, 0L, 1L, 1L, NA), X = c(1L, 0L, 0L, 1L, 0L, 0L, 
    1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 
    0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 
    0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 
    1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 
    1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L)), .Names = c("year", 
"country", "Y", "X"), class = "data.frame", row.names = c(NA, 
-110L))
#
I was not able to decipher your meaning, and your failure to garner a
response so far may mean that others may have had similar difficultes. You
might therefore try providing providing a **small reproducible** example of
X's and Y's that show what you have and what you want to end up with to
clarify your meaning. Or continue to wait for someone smarter to respond.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, May 28, 2018 at 3:43 AM, Faradj Koliev <faradj.g at gmail.com> wrote:

            

  
  
#
Hi Faradj,
What a problem! I think I have worked it out, but only because the
result is the one you said you wanted.

# the sample data frame is named fkdf
Y2Xby3<-function(x) {
 nrows<-dim(x)[1]
 X<-rep(0,nrows)
 for(i in 1:(nrows-2)) {
  if(!is.na(x$Y[i])) {
   if(x$Y[i] == 1 && any(is.na(x$Y[(i+1):(i+2)]))) X[i]<-1
   if(i > 1) {
    if(X[i-1] == 1) X[i]<-0
   }
  }
  else {
   if(!is.na(x$Y[i+1])) {
    if(x$Y[i+1] == 1 && is.na(x$Y[i+2]) && X[i] == 0)
     X[i+1]<-1
   }
  }
 }
 return(X)
}
countries<-as.character(unique(fkdf$country))
X1<-NULL
for(country in countries)
 X1<-c(X1,Y2Xby3(fkdf[fkdf$country == country,]))
X1
  [1] 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0
 [38] 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 0
 [75] 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0
[1] 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0
 [38] 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 0
 [75] 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0

Jim
On Mon, May 28, 2018 at 8:43 PM, Faradj Koliev <faradj.g at gmail.com> wrote:
#
Dear Jim, 

wow! It worked! Thanks a lot. 

I did as you suggested and it worked well with the real data. Although it gave me this error: Error in if (!is.na(x$Y[i])) { : argument is of length zero. For some reason the X1 produced less observations than it is in the data. But it's not a big deal - I identified those cases and simply deleted from the data (it was countries that only appeared twice in the data (e.g. USSR Yugoslavia etc). 

Best, 
Faradj

  
  
#
Hi Faradj,
Yes, the function expects at least three values for each country. Glad
it worked.

Jim
On Tue, May 29, 2018 at 10:53 PM, Faradj Koliev <faradj.g at gmail.com> wrote: