Skip to content

Dummy variables model

7 messages · Christoph Buser, Tobias Muhlhofer, Jean Eid +1 more

#
Hi, all!

Anyone know an easy way to specify the following model.

Panel dataset, with stock through time, by firm.

I want to run a model of y on a bunch of explanatory variables, and one 
dummy for each firm, which is 1 for observations that come from firm i, 
and 0 everywhere else. I have over 200 firms (and a factor variable that 
  contains a firm identifier).

Any easy way of going about this, without having to define all these 
dummies? I checked lme() with random = ~ 1|firm, but the problem is that 
these are random effects, i.e. that there are firm-by-firm disturbance 
terms and overall disturbance terms, whereas I want just overall 
disturbance terms. This is generally called a "fixed effects" model, 
although it seems like the term "fixed effects" is being used somewhat 
differently in the context of the nlme package.

Toby
#
Hi

If you'd like to fit a fixed effect model without random
effects, you can use lm() or aov() (see ?lm and ?aov). If your
variable is a factor (?factor) then you can specify your model
in lm() without coding all dummy variables.

Regards,

Christoph Buser

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------


Tobias Muhlhofer writes:
 > Hi, all!
 > 
 > Anyone know an easy way to specify the following model.
 > 
 > Panel dataset, with stock through time, by firm.
 > 
 > I want to run a model of y on a bunch of explanatory variables, and one 
 > dummy for each firm, which is 1 for observations that come from firm i, 
 > and 0 everywhere else. I have over 200 firms (and a factor variable that 
 >   contains a firm identifier).
 > 
 > Any easy way of going about this, without having to define all these 
 > dummies? I checked lme() with random = ~ 1|firm, but the problem is that 
 > these are random effects, i.e. that there are firm-by-firm disturbance 
 > terms and overall disturbance terms, whereas I want just overall 
 > disturbance terms. This is generally called a "fixed effects" model, 
 > although it seems like the term "fixed effects" is being used somewhat 
 > differently in the context of the nlme package.
 > 
 > Toby
 > 
 > -- 
 > **************************************************************************
 > When Thomas Edison invented the light bulb he tried over 2000
 > experiments before he got it to work. A young reporter asked
 > him how it felt to have failed so many times. He said
 > "I never failed once. I invented the light bulb.
 > It just happened to be a 2000-step process."
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 > 
 > 
 > !DSPAM:431c4675196241771238468!
#
You can turn the identity vector of the firms into a factor and do lm ....

Jean
On Mon, 5 Sep 2005, Tobias Muhlhofer wrote:

            
#
So are you guys saying to me that if I have variable firm which is the 
factor of all firm identifiers, I could just go

lm(y ~ x + firm)

and that will implicitly include a dummy for each level of factor firm, 
thus making this a fixed effects (aka LSDV) model?

T
Jean Eid wrote:

  
    
#
here's an example

data(iris)
iris1<-iris
iris1$setosa<-0
iris1[iris1$Species%in%"setosa", "setosa"]<-1
iris1$versicolor<-0
iris1$virginica<-0
iris1[iris1$Species%in%"virginica", "virginica"]<-1
iris1[iris1$Species%in%"versicolor", "versicolor"]<-1
iris1<-iris1[, !colnames(iris1)%in%"Species"]
summary(lm(Sepal.Length~.-1, data=iris1))

Call:
lm(formula = Sepal.Length ~ . - 1, data = iris1)

Residuals:
      Min        1Q    Median        3Q       Max
-0.794236 -0.218743  0.008987  0.202546  0.731034

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
Sepal.Width   0.49589    0.08607   5.761 4.87e-08 ***
Petal.Length  0.82924    0.06853  12.101  < 2e-16 ***
Petal.Width  -0.31516    0.15120  -2.084  0.03889 *
setosa        2.17127    0.27979   7.760 1.43e-12 ***
versicolor    1.44770    0.28149   5.143 8.68e-07 ***
virginica     1.14777    0.35356   3.246  0.00145 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3068 on 144 degrees of freedom
Multiple R-Squared: 0.9974,     Adjusted R-squared: 0.9973
F-statistic:  9224 on 6 and 144 DF,  p-value: < 2.2e-16



summary(lm(Sepal.Length~.-1, data=iris))

Call:
lm(formula = Sepal.Length ~ . - 1, data = iris)

Residuals:
      Min        1Q    Median        3Q       Max
-0.794236 -0.218743  0.008987  0.202546  0.731034

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
Sepal.Width        0.49589    0.08607   5.761 4.87e-08 ***
Petal.Length       0.82924    0.06853  12.101  < 2e-16 ***
Petal.Width       -0.31516    0.15120  -2.084  0.03889 *
Speciessetosa      2.17127    0.27979   7.760 1.43e-12 ***
Speciesversicolor  1.44770    0.28149   5.143 8.68e-07 ***
Speciesvirginica   1.14777    0.35356   3.246  0.00145 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3068 on 144 degrees of freedom
Multiple R-Squared: 0.9974,     Adjusted R-squared: 0.9973
F-statistic:  9224 on 6 and 144 DF,  p-value: < 2.2e-16
On Mon, 5 Sep 2005, Tobias Muhlhofer wrote:

            
#
You will need to ensure that firm is a factor and not numerical (i.e.
continuous). Here is an example 


 firm <- factor( sample(1:3, 20, replace=T) )
 x1   <- runif(20)
 y    <- rnorm(20)

 summary( fit <- lm( y ~ -1 + x1 + firm ) )
  ...
  Coefficients:
        Estimate Std. Error t value Pr(>|t|)
  x1    -0.04964    0.74861  -0.066    0.948
  firm1  0.10732    0.48269   0.222    0.827
  firm2  0.27548    0.48781   0.565    0.580
  firm3 -0.07651    0.53384  -0.143    0.888

NB : The "-1" in the formula forces each firm to have its own intercept.


Use model.matrix, you will see the dummy variables created within lm().

 model.matrix( fit )
           x1 firm1 firm2 firm3
 1  0.6641647     0     1     0
 2  0.5142712     1     0     0
 3  0.2197956     1     0     0
 4  0.3211675     0     1     0
 5  0.1892449     1     0     0
 6  0.7740754     0     0     1
 7  0.3486932     0     1     0
 8  0.2116816     0     0     1
 9  0.2426825     0     1     0
 10 0.2219768     1     0     0
 11 0.9328514     1     0     0
 12 0.7880405     0     0     1
 13 0.8673492     0     1     0
 14 0.1777998     0     1     0
 15 0.3178498     1     0     0
 16 0.3379726     0     0     1
 17 0.9193359     1     0     0
 18 0.6998152     0     1     0
 19 0.2825702     0     0     1
 20 0.6139586     1     0     0

Regards, Adai
On Mon, 2005-09-05 at 15:53 +0100, Tobias Muhlhofer wrote:
#
Dang! That's awesome!!!!!

Being at the end of an empirical PhD in which all the econometrics was 
done in R, I was already a longtime R enthusiast, but you never stop 
learning more neat features!!!

YAY to everyone involved in R's development!!!!

Toby
Adaikalavan Ramasamy wrote: