Iasonas Lamprianou sent the enclosed message to the R-SIG-Mixed-Models mailing list. It was deferred because the enclosed data in the form of a zip file containing a .xls spreadsheet exceeded the limit on size of messages. I have made the data available as a .rda file (which is less than 1/10 the size of the .xls file and about 1/3 the size of the .zip file containing the .xls file) as http://www.stat.wisc.edu/~bates/sdata.rda I enclose a transcript of a model fit to these data. I will respond to this posting with some comments. That way the conversation becomes properly threaded. ---------- Forwarded message ---------- From: Iasonas Lamprianou <lamprianou at yahoo.com> To: r-sig-mixed-models at r-project.org Date: Tue, 24 Jul 2007 06:38:28 -0700 (PDT) Subject: partly crossed design Hi all, I have a dataset (attached) which shows the score of pupils on different subjects. For example, pupil 13 may have been tested on subjects 1 (Greek), subject 4 (History) and 5 (Latin) whereas pupil 124 may have been tested on subjects 1 (Greek), 38 (Physics) and 19 (Chemistry). All pupils took Greek (subject with code 1). I attach the file with some data. Below, I show some of the R commands that I used. My intention is to estimate which of the subjects was more difficlut, and to estimate the % of variance because of pupils and because of subjects. Please give me some help, also send me any commands you may want to suggest and some explanations. I realized that SPSS 15 has included a new mixed models component, is it much better than R? library(RODBC) channel <- odbcConnectExcel("C:/JASON/PROJECTS/vathmoi2007/alldata.xls") channel <- odbcConnectExcel("C:\smalldata.xls") sqlTables(channel) sqlTables(channel, errors = FALSE, as.is = TRUE) Dataset = sqlQuery(channel, paste("select * from [data$]")) Dataset$mathima <- factor(Dataset$mathima) Dataset$app_aa <- factor(Dataset$app_aa) note: app_aa is the code of the pupil this is the model for the analysis I used, but how do I show that the pupils take more than one test? mod2 <- lmer(arxiki ~ mathima + (1|app_aa), Dataset) how about mod3 <- lmer(arxiki ~ 1 + (1|mathima) + (1|mathima:app_aa), Dataset) Thanks Jason Dr. Iasonas Lamprianou Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lamprianou at manchester.ac.uk ----------
partly crossed design
2 messages · Douglas Bates
---------- Forwarded message ---------- From: Iasonas Lamprianou <lamprianou at yahoo.com> To: r-sig-mixed-models at r-project.org Date: Tue, 24 Jul 2007 06:38:28 -0700 (PDT) Subject: partly crossed design
Hi all,
I have a dataset (attached) which shows the score of pupils on different subjects. For example, pupil 13 may have been tested on subjects 1 (Greek), subject 4 (History) and 5 (Latin) whereas pupil 124 may have been tested on subjects 1 (Greek), 38 (Physics) and 19 (Chemistry). All pupils took Greek (subject with code 1).
Apparently not. There are 9409 student id's and only 9403 scores for Greek. It would help if you could give us the complete set of codes so the labels could be changed to Greek, History, Latin, etc. instead of 9 arbitrary numbers.
I attach the file with some data. Below, I show some of the R commands that I used.
My intention is to estimate which of the subjects was more difficult, and to estimate the % of variance because of pupils and because of subjects.
The coefficients in the model that I fit to these data are typical scores on the subjects, adjusted for student abilities, as measured by the random effects.
Please give me some help, also send me any commands you may want to suggest and some explanations. I realized that SPSS 15 has included a new mixed models component, is it much better than R?
I'll let others handle that question. :-) With this message I enclose another transcript that contains a model fit with random effects for subject and for student. This would be another way of assessing difficulty of subject adjusted for student. In this case there are partially crossed random effects. (BTW, I would be interested in the results from SPSS v15 on such a model and in how long it takes to fit such a model. I should say that I am using the development version of the lme4 package in fitting these models. If I were to use the released version of the package I would replace lmer by lmer2.) ...
this is the model for the analysis I used, but how do I show that the pupils take more than one test?
I'm not sure what you mean by that. You could tabulate the table of the number of observations per student as shown in this transcript or you can show the results from a selection of students as was done here.
mod2 <- lmer(arxiki ~ mathima + (1|app_aa), Dataset)
how about mod3 <- lmer(arxiki ~ 1 + (1|mathima) + (1|mathima:app_aa), Dataset)
That won't work because the number of levels of the mathima:app_aa interaction is the same as the number of observations so the per-observation noise term is completely confounded with the random effect you are trying to specify. -------------- next part -------------- A non-text attachment was scrubbed... Name: sdata.pdf Type: application/pdf Size: 25938 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20070724/c727dac5/attachment.pdf> -------------- next part -------------- R version 2.6.0 Under development (unstable) (2007-07-24 r42297) Copyright (C) 2007 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
library(lme4)
Loading required package: Matrix Loading required package: lattice
load(url("http://www.stat.wisc.edu/~bates/sdata.rda"))
pdf(file = "sdata.pdf")
str(sdata)
'data.frame': 25120 obs. of 3 variables: $ app_aa : Factor w/ 9409 levels "1","10","1,000",..: 1 1093 2178 4362 5448 6536 7628 8721 2 109 ... $ mathima: Factor w/ 9 levels "1","19","21",..: 1 1 1 1 1 1 1 1 1 1 ... $ score : int 107 43 47 26 37 112 51 72 88 83 ... Warning message: closing unused connection 3 (gzcon(http://www.stat.wisc.edu/~bates/sdata.rda))
xtabs(~ mathima, sdata)
mathima 1 19 21 37 38 4 43 5 6 9403 448 1659 3204 1834 1736 5019 177 1640
with(sdata, table(table(app_aa)))
1 2 3 4 5 1104 2860 3542 1845 58
set.seed(123454321) subs <- subset(sdata, app_aa %in% sample(levels(app_aa), 20)) subs[order(subs$app_aa),]
app_aa mathima score 1151 1,173 1 14 1200 1,223 1 84 13616 1,223 21 10 15454 1,223 37 41 18486 1,223 38 23 1998 2,036 1 110 9766 2,036 4 74 21207 2,036 43 130 2082 2,124 1 77 11683 2,124 6 129 21250 2,124 43 130 2526 2,580 1 69 15912 2,580 37 24 2929 2,990 1 97 21706 2,990 43 128 3202 3,267 1 82 21844 3,267 43 110 3296 3,363 1 44 21894 3,363 43 87 3657 3,731 1 94 16297 3,731 37 104 18985 3,731 38 104 3764 3,841 1 8 22138 3,841 43 21 3845 3,923 1 75 16373 3,923 37 106 19028 3,923 38 108 5982 6,108 1 5 7061 7,209 1 71 12567 7,209 6 97 23881 7,209 43 23 8182 8,346 1 141 10906 8,346 4 147 24477 8,346 43 191 8320 8,486 1 12 8401 8,567 1 35 24591 8,567 43 31 850 867 1 19 20580 867 43 40 8790 8,965 1 146 11024 8,965 4 170 24794 8,965 43 197 9086 9,261 1 98 18153 9,261 37 77 20042 9,261 38 106 9107 9,282 1 43 11084 9,282 4 27 12914 9,282 6 124 24966 9,282 43 14
densityplot(~ score, sdata, plot.points = FALSE) densityplot(~ score|mathima, sdata, plot.points = FALSE) system.time(m1 <- lmer(score ~ 0 + mathima + (1|app_aa),
+ sdata, control = list(msV = 1))) 0: 258960.61: 0.999416 1: 258081.58: 1.99942 2: 257669.40: 1.73209 3: 257601.36: 1.48965 4: 257583.86: 1.57219 5: 257583.55: 1.56285 6: 257583.54: 1.56229 7: 257583.54: 1.56230 user system elapsed 2.220 0.036 2.255
system.time(m1 <- lmer(score ~ 0 + mathima + (1|app_aa), sdata))
user system elapsed 0.904 0.036 0.942
m1
Linear mixed-effects model fit by REML
Formula: score ~ 0 + mathima + (1 | app_aa)
Data: sdata
AIC BIC logLik MLdeviance REMLdeviance
257604 257685 -128792 257597 257584
Random effects:
Groups Name Variance Std.Dev.
app_aa 1947.05 44.125
Residual 797.84 28.246
Number of obs: 25120, groups: app_aa, 9409
Fixed effects:
Estimate Std. Error t value
mathima1 81.0323 0.5402 150.01
mathima19 75.0738 1.5907 47.20
mathima21 86.2260 0.9119 94.55
mathima37 86.0341 0.7428 115.82
mathima38 91.0371 0.8998 101.18
mathima4 80.6845 0.9060 89.05
mathima43 109.5310 0.6398 171.21
mathima5 118.4503 2.4699 47.96
mathima6 126.3008 0.9205 137.20
Correlation of Fixed Effects:
mathm1 mthm19 mthm21 mthm37 mthm38 mathm4 mthm43 mathm5
mathima19 0.241
mathima21 0.420 0.206
mathima37 0.516 0.243 0.341
mathima38 0.426 0.198 0.291 0.455
mathima4 0.423 0.113 0.230 0.242 0.194
mathima43 0.599 0.167 0.351 0.338 0.276 0.423
mathima5 0.155 0.040 0.078 0.085 0.069 0.165 0.157
mathima6 0.416 0.118 0.229 0.274 0.206 0.308 0.391 0.094
system.time(m2 <- lmer(score ~ 1 + (1|mathima) + (1|app_aa),
+ sdata, control = list(msV = 1))) 0: 260067.68: 0.0309098 0.999416 1: 258490.09: 1.02452 1.11227 2: 257665.35: 1.02320 1.61227 3: 257658.21: 1.02195 1.57729 4: 257657.49: 1.02005 1.56186 5: 257657.48: 1.01875 1.56236 6: 257657.32: 0.996976 1.56620 7: 257657.03: 0.948383 1.57035 8: 257656.33: 0.801040 1.57692 9: 257655.81: 0.645518 1.57769 10: 257655.25: 0.660122 1.57025 11: 257655.04: 0.636126 1.56172 12: 257655.04: 0.654799 1.56199 13: 257655.03: 0.649502 1.56216 14: 257655.03: 0.649233 1.56214 user system elapsed 1.261 0.008 1.266
system.time(m2 <- lmer(score ~ 1 + (1|mathima) + (1|app_aa), sdata))
user system elapsed 1.236 0.020 1.258
m2
Linear mixed-effects model fit by REML
Formula: score ~ 1 + (1 | mathima) + (1 | app_aa)
Data: sdata
AIC BIC logLik MLdeviance REMLdeviance
257661 257685 -128828 257661 257655
Random effects:
Groups Name Variance Std.Dev.
mathima 336.32 18.339
app_aa 1946.80 44.123
Residual 797.89 28.247
Number of obs: 25120, groups: mathima, 9; app_aa, 9409
Fixed effects:
Estimate Std. Error t value
(Intercept) 94.902 6.141 15.45
ranef(m2)$mathima
(Intercept) 1 -13.865747 19 -19.678775 21 -8.649562 37 -8.845464 38 -3.841410 4 -14.212544 43 14.613201 5 23.142992 6 31.337310
proc.time()
user system elapsed 19.637 0.320 19.987