Skip to content
Prev 177515 / 398503 Next

Dummy (factor) based on a pair of variables

+ 1,AUT,BEL
+ 2,AUT,GER
+ 3,BEL,GER"), header=T,sep=",", as.is=T)
 > df
   y   i   j
1 1 AUT BEL
2 2 AUT GER
3 3 BEL GER
 > countries <- unique(c(df$i,df$j))
 > countries
[1] "AUT" "BEL" "GER"

 > df[countries] <- sapply(countries, function(x) df[x] <<- df$i == x  
| df$j == x)
 > df
   y   i   j   AUT   BEL   GER
1 1 AUT BEL  TRUE  TRUE FALSE
2 2 AUT GER  TRUE FALSE  TRUE
3 3 BEL GER FALSE  TRUE  TRUE

Obviously it would not be possible to test this arrangement with lm.

So I tried scaling it up and testing on:
  dft <- data.frame(y=rnorm(100), i = sample(countries, 100,  
replace=T), j= sample(countries, 100, replace=T))
#Removed all the duplicates with:
dft <- dft(dft$i != dft$j, ]
#and it did not give proper answers.

This seems to give correct answers
  dft[countries] <- sapply(countries, function(y) apply(dft, 1,  
function(x)   x[2] == y | x[3] == y))

And application of those variables is handles in a reasonable manner  
by the R formula parser:
 > lm(y ~ AUT + BEL + GER, data=dft)

Call:
lm(formula = y ~ AUT + BEL + GER, data = dft)

Coefficients:
(Intercept)      AUTTRUE      BELTRUE      GERTRUE
     0.09192      0.15130     -0.29274           NA

-
David Winsemius
On Apr 18, 2009, at 4:09 PM, Jason Morgan wrote:

            
David Winsemius, MD
Heritage Laboratories
West Hartford, CT