Trends for many units - R-help

Fri, Jan 5, 2001 1:22 PM #

I have data on every grade in all elementary schools in Chicago over 5 years. I
would like to estimate a trend over time for each grade in each school. There
are 17,600 data all together (about 460 schools, nearly 8 grades each, over 5
years). Is there a not-so-hard way to do this in R (I was thinking of using
rlm)?
______________________________________________________________________
Stuart Luppescu         -=-=-  University of Chicago
$(B:MJ8$HCRF`H~$NIc(B        -=-=-  s-luppescu at uchicago.edu
http://www.consortium-chicago.org/people/sl.html
Finger sl70 at musuko.uchicago.edu forPGP Public Key
ICQ #21172047  AIM: psycho7070
You can't run away forever,
But there's nothing wrong with getting a good head start.
                -- Jim Steinman, "Rock and Roll Dreams Come Through"

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Fri, Jan 5, 2001 1:37 PM #

On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:

And the statistical model is?  5 years is a short series, and I would have
thought a multilevel model was appropriate (and in R that means using lme).
I'l leave it to someone who understands the terms (grades are a response
in my terminology) to suggest a model.

rlm and friends assume independent errors, which looks dubious here.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Stuart Luppescu

Fri, Jan 5, 2001 2:24 PM #

On 05-Jan-2001 Prof Brian D Ripley wrote:

Yes, there may be some ambiguity in the terminology. ``Grade'' refers to the
year in school (as in, ``first grade'', ``second grade'', etc.). Here is a small
portion of the data set:

Unit     Year      Grade     Pct.Excl
2010      1996         1       0.0789
2010      1997         1       0.0000
2010      1998         1       0.1034
2010      1999         1       0.0286
2010      2000         1       0.0000
2010      1996         2       0.1471
2010      1997         2       0.1282
2010      1998         2       0.0250
2010      1999         2       0.0800
2010      2000         2       0.0588
2010      1996         3       0.0938
2010      1997         3       0.2188
2010      1998         3       0.2000
2010      1999         3       0.1020
2010      2000         3       0.1000

Unit is the school number. Basically, I want to do something like:
rlm(Pct.Excl ~ Year) for each Unit-Grade combination.

I chose rlm because with the small number of data points (max of 5 per
school-grade) a single outlier can have a very large influence. I don't know
why errors shouldn't be independent here, but I'm willing to be convinced.
______________________________________________________________________
Stuart Luppescu         -=-=-  University of Chicago
$(B:MJ8$HCRF`H~$NIc(B        -=-=-  s-luppescu at uchicago.edu
http://www.consortium-chicago.org/people/sl.html
Finger sl70 at musuko.uchicago.edu forPGP Public Key
ICQ #21172047  AIM: psycho7070
If computers take over (which seems to be their natural tendency), it will
serve us right.
                -- Alistair Cooke

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Fri, Jan 5, 2001 3:45 PM #

On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:

On 05-Jan-2001 Prof Brian D Ripley wrote:

On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:

I have data on every grade in all elementary schools in Chicago over 5
years. I
would like to estimate a trend over time for each grade in each school.
There
are 17,600 data all together (about 460 schools, nearly 8 grades each, over
5
years). Is there a not-so-hard way to do this in R (I was thinking of using
rlm)?

And the statistical model is?  5 years is a short series, and I would have
thought a multilevel model was appropriate (and in R that means using lme).
I'l leave it to someone who understands the terms (grades are a response
in my terminology) to suggest a model.

Yes, there may be some ambiguity in the terminology. ``Grade'' refers to the
year in school (as in, ``first grade'', ``second grade'', etc.). Here is a small
portion of the data set:

Unit     Year      Grade     Pct.Excl
2010      1996         1       0.0789
2010      1997         1       0.0000
2010      1998         1       0.1034
2010      1999         1       0.0286
2010      2000         1       0.0000
2010      1996         2       0.1471
2010      1997         2       0.1282
2010      1998         2       0.0250
2010      1999         2       0.0800
2010      2000         2       0.0588
2010      1996         3       0.0938
2010      1997         3       0.2188
2010      1998         3       0.2000
2010      1999         3       0.1020
2010      2000         3       0.1000

Unit is the school number. Basically, I want to do something like:
rlm(Pct.Excl ~ Year) for each Unit-Grade combination.

rlm and friends assume independent errors, which looks dubious here.

I chose rlm because with the small number of data points (max of 5 per
school-grade) a single outlier can have a very large influence. I don't know
why errors shouldn't be independent here, but I'm willing to be convinced.

Wouldn't you expect two grades in one school to be more similar than two
grades in different schools?  And would not slopes for different grades
in one school be more similar than across schools?  Those translate into
dependence.

For an lm-type model you can circumvent this by treating all the
school-grade combinations as fixed effects.  Thus

(r)lm(Pct.Excl ~ Unit*Grade*Year)

(and I would centre Year on 1998) fits 3520 lines with a common assumed
error variance. That's a lot of parameters to fit in one go, and you will
probably find lmList in package nlme helpful.  But my suggestion for a
model is

i Unit
j Grade
t Year

y_{ijt} = mu + beta_j + gamma * t + eta_i + zeta_{ij} + epsilon_{ijt}

eta, zeta, epsilon iid with common variances in each group.

that is fixed effects for Grade, random effects for Unit and Unit | Grade.
You may or may not need additional random effects

lambda_i * Year + kappa_{ij} * Year

As set up here, independence of all the rvs is plausible, but lme does not
require it.  The predict.lme will give you BLUP lines for each Unit-Grade
combination, and they will not be the fitted values in the fixed-effects
model.  Most social statisticians I know (and we have have some local
stars) think that the second is more valuable, and routinely use it.

Snijders, T.A.B. and Bosker, R.J. (1999) Multilevel Analysis. Sage.

have an example of IQ tests adminstered to students in classes in schools
done in exactly this way.  (And that was the ref the experts recommended
for social applications.)

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Yves Gauvreau

Sat, Jan 6, 2001 4:18 AM #

Hi,

As I read this thread. I can not stop to think that we rarely see this kind
of discussion here and even less in books. Since most of the time, question
like these just die out because they are a bit out of line in a manner of
speaking. I wonder if it would be possible to either accept gracefully more
of these kind of questions here or maybe (just maybe) create a new list
dedicated to discussion on applied statistics using R or something along
those lines.

Just a thought.

Yves Gauvreau
B.E.F.P. Universite du Quebec a Montreal
cyg at sympatico.ca

-----Message d'origine-----
De : owner-r-help at stat.math.ethz.ch
[mailto:owner-r-help at stat.math.ethz.ch]De la part de Prof Brian D Ripley
Envoye : Friday, January 05, 2001 6:45 PM
A : s-luppescu at uchicago.edu
Cc : R Help List
Objet : Re: [R] Trends for many units


On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:

On 05-Jan-2001 Prof Brian D Ripley wrote:

On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:

I have data on every grade in all elementary schools in

Chicago over 5

years. I
would like to estimate a trend over time for each grade in

each school.

There
are 17,600 data all together (about 460 schools, nearly 8

grades each, over

5
years). Is there a not-so-hard way to do this in R (I was

thinking of using

rlm)?

And the statistical model is?  5 years is a short series, and

I would have

thought a multilevel model was appropriate (and in R that

means using lme).

I'l leave it to someone who understands the terms (grades are

a response

in my terminology) to suggest a model.

Yes, there may be some ambiguity in the terminology. ``Grade''

refers to the

year in school (as in, ``first grade'', ``second grade'',

etc.). Here is a small

portion of the data set:

Unit     Year      Grade     Pct.Excl
2010      1996         1       0.0789
2010      1997         1       0.0000
2010      1998         1       0.1034
2010      1999         1       0.0286
2010      2000         1       0.0000
2010      1996         2       0.1471
2010      1997         2       0.1282
2010      1998         2       0.0250
2010      1999         2       0.0800
2010      2000         2       0.0588
2010      1996         3       0.0938
2010      1997         3       0.2188
2010      1998         3       0.2000
2010      1999         3       0.1020
2010      2000         3       0.1000

Unit is the school number. Basically, I want to do something like:
rlm(Pct.Excl ~ Year) for each Unit-Grade combination.

rlm and friends assume independent errors, which looks dubious here.

I chose rlm because with the small number of data points (max of 5 per
school-grade) a single outlier can have a very large influence.

I don't know

why errors shouldn't be independent here, but I'm willing to be

convinced.

Wouldn't you expect two grades in one school to be more similar than two
grades in different schools?  And would not slopes for different grades
in one school be more similar than across schools?  Those translate into
dependence.

For an lm-type model you can circumvent this by treating all the
school-grade combinations as fixed effects.  Thus

(r)lm(Pct.Excl ~ Unit*Grade*Year)

(and I would centre Year on 1998) fits 3520 lines with a common assumed
error variance. That's a lot of parameters to fit in one go, and you will
probably find lmList in package nlme helpful.  But my suggestion for a
model is

i Unit
j Grade
t Year

y_{ijt} = mu + beta_j + gamma * t + eta_i + zeta_{ij} + epsilon_{ijt}

eta, zeta, epsilon iid with common variances in each group.

that is fixed effects for Grade, random effects for Unit and Unit | Grade.
You may or may not need additional random effects

lambda_i * Year + kappa_{ij} * Year

As set up here, independence of all the rvs is plausible, but lme does not
require it.  The predict.lme will give you BLUP lines for each Unit-Grade
combination, and they will not be the fitted values in the fixed-effects
model.  Most social statisticians I know (and we have have some local
stars) think that the second is more valuable, and routinely use it.

Snijders, T.A.B. and Bosker, R.J. (1999) Multilevel Analysis. Sage.

have an example of IQ tests adminstered to students in classes in schools
done in exactly this way.  (And that was the ref the experts recommended
for social applications.)


--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-.-.-.-
r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._._._._._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Stuart Luppescu

Mon, Jan 8, 2001 7:53 AM #

On 05-Jan-2001 Prof Brian D Ripley wrote:

Yes, indeed, but I wasn't anticipating estimating all effects at the same time.
What I have done before with this type of problem in SAS is to estimate
parameters for one unit at a time, and doing the recalculation of the weights
and the iteration manually (in a macro loop), so between unit dependence is not
an issue.

Thank you very much for the very valuable help (as always).
______________________________________________________________________
Stuart Luppescu         -=-=-  University of Chicago
$(B:MJ8$HCRF`H~$NIc(B        -=-=-  s-luppescu at uchicago.edu
http://www.consortium-chicago.org/people/sl.html
Finger sl70 at musuko.uchicago.edu forPGP Public Key
ICQ #21172047  AIM: psycho7070
History tends to exaggerate.
                -- Col. Green, "The Savage Curtain", stardate 5906.4

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._