Dear John,
interesting. There must be a bottleneck somewhere, which possibly went
unnoticed because econometricians seldom use so many data points. In
fact 'plm' wasn't designed to handle "only" 700 Megs of data at a time;
but we're happy to investigate in this direction too. E.g., I was aware
of some efficiency problems if effect="twoways" but I seem to understand
that you are using effect="individual"? --> which takes me to the main
point.
I understand that enclosing the data for a reproducible report, as
requested by the posting guide, is awkward for such a big dataset. Yet
it would be of great help if you at least produced:
- an output of your procedure, in order to see what goes wrong and where
- the output of traceback() called immediately after you got the error
(idem)
and possibly gave it a try with lm() applied to the very same formula
and data, maybe into a system.time( ... ) statement.
Else, the information you provide is way too scant to even make an
educated guess. For example, it isn't clear whether the problem is
related to plm() or to vcovHC.plm etc.
As far as "simple demeaning" is concerned, you might try the following
code, which really does only that. Be aware that **standard errors are
biased** etc. etc., this is not meant to be a proper function but just a
computational test for your data and a quick demonstration of demeaning.
'plm()' is far more structured, for a number of reasons. Please execute
it inside system.time() again.
######### test function for within model, BIASED SEs !! #############
##
## ## example:
## data(Produc, package="plm")
## mod <- FEmod(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
index=Produc$state, data=Produc)
## summary(mod)
## ## compare with:
## library(plm)
## example(plm)
demean<-function(x,index,lambda=1,na.rm=F) {
as.vector(x-lambda*tapply(x,index,mean,na.rm=na.rm)[unclass(as.factor(in
dex))])
}
FEmod<-function(formula,index,data=ls()) {
## fit a model without intercept in any case
formula<-as.formula(paste(deparse(formula(formula)),"-1",sep=""))
X<-model.matrix(formula,data=data)
y<-model.response(model.frame(formula,data=data))
## reduce index accordingly
names(index)<-row.names(data)
ind<-index[which(names(index)%in%row.names(X))]
## within transf.
MX<-matrix(NA,ncol=dim(X)[[2]],nrow=dim(X)[[1]])
for(i in 1:dim(X)[[2]]) {
MX[,i]<-demean(X[,i],index=ind,lambda=1)
}
My<-demean(y,index=ind,lambda=1)
## estimate within model
femod<-lm(My~MX-1)
return(femod)
}
####### end test function ########
Best,
Giovanni
########### original message #############
------------------------------
Message: 28
Date: Tue, 07 Feb 2012 15:35:07 +0100
From: cariboupad at gmx.fr
To: r-help at r-project.org
Subject: [R] fixed effects with clustered standard errors
Message-ID: <20120207143507.142490 at gmx.com>
Content-Type: text/plain; charset="utf-8"
Dear R-helpers,
I have a very simple question and I really hope that someone could help
me
I would like to estimate a simple fixed effect regression model with
clustered standard errors by individuals.
For those using Stata, the counterpart would be xtreg with the "fe"
option, or areg with the "absorb" option and in both case the clustering
is achieved with "vce(cluster id)"
My question is : how could I do that with R ? An important point is that
I have too many individuals, therefore I cannot include dummies and
should use the demeaning "usual" procedure.
I tried with the plm package with the "within" option, but R quikcly
tells me that the memory limits are attained (I have over 10go ram!)
while the dataset is only 700mo (about 50 000 individuals, highly
unbalanced)
I dont understand... plm do indeed demean the data so the computation
should be fast and light enough... ?!
Are there any other solutions ?
Many thanks in advance ! ;)
John
############ end original message ############
Giovanni Millo, PhD
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 4,
34132 Trieste (Italy)
tel. +39 040 671184
fax +39 040 671160
?
Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}
fixed effects with clustered standard errors
3 messages · Millo Giovanni, Francesco
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120209/c8acb4ac/attachment.pl>
2 days later
Dear Giovanni, I recalled the procedure and here is the output : Erreur : impossible d'allouer un vecteur de taille 39.2 Mo De plus : Messages d'avis : 1: In structure(list(message = as.character(message), call = call), : Reached total allocation of 12279Mb: see help(memory.size) 2: In structure(list(message = as.character(message), call = call), : Reached total allocation of 12279Mb: see help(memory.size) 3: In structure(list(message = as.character(message), call = call), : Reached total allocation of 12279Mb: see help(memory.size) 4: In structure(list(message = as.character(message), call = call), : Reached total allocation of 12279Mb: see help(memory.size)
traceback()
Pas d'historique des fonctions appel?es ('traceback') disponible
I had a similar problem with another stat software but in a different
context : when I tried to fit a fixed effect logit model.
The soft did not converge because, as you rightly guessed at the
beginning of this thread, the number of points for some individuals is
too high.
This might be the source of error here, probably?
I recall that the median number of points in my database is quite low
at 10, but I have individuals with more than 2000, 5000 or even 50 000
points!
What do you think ?
Many thanks
Best,
On 9 February 2012 10:31, John L <cariboupad at gmx.fr> wrote:
Dear Giovanni, Many thanks for your interesting suggestions. Your guess is indeed right, I only use the 'within' fixed effects specification. I will soon send to this list all the additional information you requested in order to understand what might cause this problem, but I would say as a first guess that the inefficiency is (probably?) due to individuals with too many datapoints : the median number of points is 10, but I have some individuals with more than 1000, 5000 or even 80 000 points overall! So basically my dataset is probably too strange, as you suggested, compared to the "standard" panel dataset in social sciences... To be continued... ;-) Many thanks again Best, On 8 February 2012 18:55, Millo Giovanni [via R] <ml-node+s789695n4370302h35 at n4.nabble.com> wrote:
Dear John,
interesting. There must be a bottleneck somewhere, which possibly went
unnoticed because econometricians seldom use so many data points. In
fact 'plm' wasn't designed to handle "only" 700 Megs of data at a time;
but we're happy to investigate in this direction too. E.g., I was aware
of some efficiency problems if effect="twoways" but I seem to understand
that you are using effect="individual"? --> which takes me to the main
point.
I understand that enclosing the data for a reproducible report, as
requested by the posting guide, is awkward for such a big dataset. Yet
it would be of great help if you at least produced:
- an output of your procedure, in order to see what goes wrong and where
- the output of traceback() called immediately after you got the error
(idem)
and possibly gave it a try with lm() applied to the very same formula
and data, maybe into a system.time( ... ) statement.
Else, the information you provide is way too scant to even make an
educated guess. For example, it isn't clear whether the problem is
related to plm() or to vcovHC.plm etc.
As far as "simple demeaning" is concerned, you might try the following
code, which really does only that. Be aware that **standard errors are
biased** etc. etc., this is not meant to be a proper function but just a
computational test for your data and a quick demonstration of demeaning.
'plm()' is far more structured, for a number of reasons. Please execute
it inside system.time() again.
######### test function for within model, BIASED SEs !! #############
##
## ## example:
## data(Produc, package="plm")
## mod <- FEmod(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
index=Produc$state, data=Produc)
## summary(mod)
## ## compare with:
## library(plm)
## example(plm)
demean<-function(x,index,lambda=1,na.rm=F) {
as.vector(x-lambda*tapply(x,index,mean,na.rm=na.rm)[unclass(as.factor(in
dex))])
? }
FEmod<-function(formula,index,data=ls()) {
? ## fit a model without intercept in any case
? formula<-as.formula(paste(deparse(formula(formula)),"-1",sep=""))
? X<-model.matrix(formula,data=data)
? y<-model.response(model.frame(formula,data=data))
? ## reduce index accordingly
? names(index)<-row.names(data)
? ind<-index[which(names(index)%in%row.names(X))]
? ## within transf.
? MX<-matrix(NA,ncol=dim(X)[[2]],nrow=dim(X)[[1]])
? for(i in 1:dim(X)[[2]]) {
? ? MX[,i]<-demean(X[,i],index=ind,lambda=1)
? ? }
? My<-demean(y,index=ind,lambda=1)
? ## estimate within model
? femod<-lm(My~MX-1)
? return(femod)
}
####### end test function ########
Best,
Giovanni
########### original message #############
------------------------------
Message: 28
Date: Tue, 07 Feb 2012 15:35:07 +0100
From: [hidden email]
To: [hidden email]
Subject: [R] fixed effects with clustered standard errors
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"
Dear R-helpers,
I have a very simple question and I really hope that someone could help
me
I would like to estimate a simple fixed effect regression model with
clustered standard errors by individuals.
For those using Stata, the counterpart would be xtreg with the "fe"
option, or areg with the "absorb" option and in both case the clustering
is achieved with "vce(cluster id)"
My question is : how could I do that with R ? An important point is that
I have too many individuals, therefore I cannot include dummies and
should use the demeaning "usual" procedure.
I tried with the plm package with the "within" option, but R quikcly
tells me that the memory limits are attained (I have over 10go ram!)
while the dataset is only 700mo (about 50 000 individuals, highly
unbalanced)
I dont understand... plm do indeed demean the data so the computation
should be fast and light enough... ?!
Are there any other solutions ?
Many thanks in advance ! ;)
John
############ end original message ############
Giovanni Millo, PhD
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 4,
34132 Trieste (Italy)
tel. +39 040 671184
fax ?+39 040 671160
Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}
______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ________________________________ If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/fixed-effects-with-clustered-standard-errors-tp4364856p4370302.html To unsubscribe from fixed effects with clustered standard errors, click here. NAML
-- View this message in context: http://r.789695.n4.nabble.com/fixed-effects-with-clustered-standard-errors-tp4364856p4372305.html Sent from the R help mailing list archive at Nabble.com. ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.