Trends for many units
On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:
On 05-Jan-2001 Prof Brian D Ripley wrote:
On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:
I have data on every grade in all elementary schools in Chicago over 5 years. I would like to estimate a trend over time for each grade in each school. There are 17,600 data all together (about 460 schools, nearly 8 grades each, over 5 years). Is there a not-so-hard way to do this in R (I was thinking of using rlm)?
And the statistical model is? 5 years is a short series, and I would have thought a multilevel model was appropriate (and in R that means using lme). I'l leave it to someone who understands the terms (grades are a response in my terminology) to suggest a model.
Yes, there may be some ambiguity in the terminology. ``Grade'' refers to the year in school (as in, ``first grade'', ``second grade'', etc.). Here is a small portion of the data set: Unit Year Grade Pct.Excl 2010 1996 1 0.0789 2010 1997 1 0.0000 2010 1998 1 0.1034 2010 1999 1 0.0286 2010 2000 1 0.0000 2010 1996 2 0.1471 2010 1997 2 0.1282 2010 1998 2 0.0250 2010 1999 2 0.0800 2010 2000 2 0.0588 2010 1996 3 0.0938 2010 1997 3 0.2188 2010 1998 3 0.2000 2010 1999 3 0.1020 2010 2000 3 0.1000 Unit is the school number. Basically, I want to do something like: rlm(Pct.Excl ~ Year) for each Unit-Grade combination.
rlm and friends assume independent errors, which looks dubious here.
I chose rlm because with the small number of data points (max of 5 per school-grade) a single outlier can have a very large influence. I don't know why errors shouldn't be independent here, but I'm willing to be convinced.
Wouldn't you expect two grades in one school to be more similar than two
grades in different schools? And would not slopes for different grades
in one school be more similar than across schools? Those translate into
dependence.
For an lm-type model you can circumvent this by treating all the
school-grade combinations as fixed effects. Thus
(r)lm(Pct.Excl ~ Unit*Grade*Year)
(and I would centre Year on 1998) fits 3520 lines with a common assumed
error variance. That's a lot of parameters to fit in one go, and you will
probably find lmList in package nlme helpful. But my suggestion for a
model is
i Unit
j Grade
t Year
y_{ijt} = mu + beta_j + gamma * t + eta_i + zeta_{ij} + epsilon_{ijt}
eta, zeta, epsilon iid with common variances in each group.
that is fixed effects for Grade, random effects for Unit and Unit | Grade.
You may or may not need additional random effects
lambda_i * Year + kappa_{ij} * Year
As set up here, independence of all the rvs is plausible, but lme does not
require it. The predict.lme will give you BLUP lines for each Unit-Grade
combination, and they will not be the fitted values in the fixed-effects
model. Most social statisticians I know (and we have have some local
stars) think that the second is more valuable, and routinely use it.
Snijders, T.A.B. and Bosker, R.J. (1999) Multilevel Analysis. Sage.
have an example of IQ tests adminstered to students in classes in schools
done in exactly this way. (And that was the ref the experts recommended
for social applications.)
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._