Skip to content

Naming a random effect in lmer

9 messages · Leigh Ann Starcevich, Spencer Graves, William Dunlap +1 more

#
The first exaample on the "lmer" help page uses a formula 
"Reaction ~ Days + (Days|Subject)".  Here, "Subject" is the name of a 
column in the data.frame "sleepstudy", with levels "308", "309", ... . 


      Does this answer your question?  If no, please provide commented, 
minimal, self-contained, reproducible code, as requested in the posting 
guide "http://www.R-project.org/posting-guide.html".  Your example is 
not "self-contained, reproducible". 


      Hope this helps. 
      Spencer
Leigh Ann Starcevich wrote:
#
I miscommunicated: In every application I've seen with large numbers 
of parameters to estimate, most of those parameters are specific 
instances of different levels of a random effect. For example, a 
colleague recently did a fixed effects analysis of a longitudinal 
abundance survey of a large number of species of wildlife. This 
colleague claimed that treating species as a random effect was 
inappropriate, because the species were not selected by use of random 
numbers. With only two or three species, if the scientists were only 
concerned with those two or three species, this may be reasonable.


However, most scientists -- and people who study their work -- will want 
to generalize beyond only the species in the study. Even if scientists 
were only interested in two or three species for which they had data, 
Stein's phenomenon suggests that even a rabid Bayesophobe would likely 
get lower mean square error at the expense of some bias from pretending 
the species were randomly selected from some larger population 
(http://en.wikipedia.org/wiki/James-Stein_estimator).


If I were asked to referee a paper or serve on a thesis committee for a 
study estimating 30 random effects, I would not be happy unless there 
were a convincing discussion of why it was appropriate to estimate all 
those random effects separately; right now, I can not imagine an 
application where that would be appropriate.


If you and your advisor still feel that what you are doing makes sense, 
I suggest you first get the source code via "svn checkout 
svn://svn.r-forge.r-project.org/svnroot/lme4" (or by downloading 
"lme4_0.999375-30.tar.gz" from 
"http://cran.fhcrc.org/web/packages/lme4/index.html"), then walk through 
the code line by line using the "debug" function (or "browser" or the 
"debug" package). From this, you will likely see either (a) how you can 
do what you want differently to achieve the same result or (b) how to 
modify the code so it does what you want.


Hope this helps.
Spencer
Leigh Ann Starcevich wrote:
#
The coding error is right in the error message:
  Error in names(bars) <- unlist(lapply(bars, function(x)
deparse(x[[3]])))
and I suspect that traceback() would tell you that came from a call
to lmerFactorList.

That code implicitly assumes that deparse() will produce a scalar
character
vector, but it doesn't if the input expression is complicated enough.
Changing the
    deparse(x[[3]])
to
    deparse(x[[3]])[1]
or
    paste(collapse=" ", deparse(x[[3]])[1])
would fix it.  The first truncates the name and the second my make a
very
long name.

There is at least one other use of that idiom in the lme4 code and your
dataset and analysis may require that all of them be fixed.
#
Hi Bill,

I'm about to take a look at this.  If I understand the issue, very
long expressions for what I call the "grouping factor" of a random
effects term (the expressions on the right hand side of the vertical
bar) are encountering problems with deparse.  I should have realized
that, any time one uses deparse, disaster looms.

I can tell you the reason that the collection of random-effects terms
is being named is partly for the printed form and partly so that terms
with the same grouping factor can be associated.

I guess my simplistic solution to the problem would be to precompute
these sums and give them names, if it is the sum like

Z2 + Z3 + Z4 + Z5 + Z6 + Z7 + Z8 + Z9

that is important, why not evaluate the sum

testGroupSamp <- within(testGroupSamp, Z29 <- Z2 + Z3 + Z4 + Z5 + Z6 +
Z7 + Z8 + Z9)

and use Z29 as the grouping factor.

Even the use of variables with names like Z1, Z2, ... and the use of
expressions like paste("Z", 2:9, sep = "") is not idiomatic R/S code.
It's an SPSS/SASism.  (You know I never realized before how close the
word "SASism", meaning a construction that is natural in SAS, is to
"Sadism".)  Why not create a matrix Z and evaluate these sums as
matrix/vector products?


Zs2<- paste("Z",2:9,sep="")
On Fri, May 22, 2009 at 5:30 PM, William Dunlap <wdunlap at tibco.com> wrote:
#
On Sun, May 24, 2009 at 10:41 PM, Leigh Ann Starcevich <lah at peak.org> wrote:
But what you tried is not what I wrote.   The paste function creates a
character string - that's all.  You could create a numeric variable or
a factor by parsing and evaluating the resulting character string but
that is rarely a good way of doing things.  See

library(fortunes)
fortune("rethink")

What I wrote was to evaluate the expression

Z29 <- Z2 + Z3 + Z4 + Z5 + Z6 + Z7 + Z9 + Z9

within the data frame and that does work.  See the attached.

I still don't really understand the approach.  It is going about
things in an awkward way and I suspect it is the result of someone
with experience in SAS or SPSS trying to emulate their approach in R.
It is (or was, the last time I used either of those systems) common to
name variables like Z2, Z3, Z4, ... so you can access groups of
variables in those languages.  In R that is unnecessary because data
can be packaged into arbitrary, self-describing structures, including
matrices.  If the natural thing to do with the variables Z2, Z3, ...,
Z13 is to accumulate them in sums then it would be much simpler if you
create a matrix from them and use that.
-------------- next part --------------

R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Loading required package: Matrix
Loading required package: lattice

Attaching package: 'Matrix'


	The following object(s) are masked from package:stats :

	 xtabs 


	The following object(s) are masked from package:base :

	 rcond
R version 2.9.0 (2009-04-17) 
i486-pc-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lme4_0.999375-31   Matrix_0.999375-26 lattice_0.17-25   

loaded via a namespace (and not attached):
[1] grid_2.9.0
'data.frame':	105 obs. of  20 variables:
 $ Year    : num  2008 2008 2008 2008 2008 ...
 $ WYear   : num  0 0 0 0 0 0 0 1 1 1 ...
 $ Site    : num  4 18 26 40 67 75 94 4 18 26 ...
 $ LogY    : num  0.849 0.809 0.734 1.467 0.716 ...
 $ Y       : num  2.34 2.25 2.08 4.34 2.05 ...
 $ Z2      : num  0.472 0.472 0.472 0.472 0.472 ...
 $ Z3      : num  -0.456 -0.456 -0.456 -0.456 -0.456 ...
 $ Z4      : num  0.394 0.394 0.394 0.394 0.394 ...
 $ Z5      : num  -0.308 -0.308 -0.308 -0.308 -0.308 ...
 $ Z6      : num  0.219 0.219 0.219 0.219 0.219 ...
 $ Z7      : num  -0.142 -0.142 -0.142 -0.142 -0.142 ...
 $ Z8      : num  0.0833 0.0833 0.0833 0.0833 0.0833 ...
 $ Z9      : num  -0.044 -0.044 -0.044 -0.044 -0.044 ...
 $ Z10     : num  0.0207 0.0207 0.0207 0.0207 0.0207 ...
 $ Z11     : num  -0.0085 -0.0085 -0.0085 -0.0085 -0.0085 ...
 $ Z12     : num  0.00295 0.00295 0.00295 0.00295 0.00295 ...
 $ Z13     : num  -0.00082 -0.00082 -0.00082 -0.00082 -0.00082 ...
 $ Z14     : num  0.000158 0.000158 0.000158 0.000158 0.000158 ...
 $ WYearCen: num  -7 -7 -7 -7 -7 -7 -7 -6 -6 -6 ...
 $ Z29     : num  0.218 0.218 0.218 0.218 0.218 ...
Z29
  -1.63314896653237  -0.592612960445885  -0.394922672042026  -0.233838363771464 
                  7                   7                   7                   7 
 -0.214767257996307 -0.0925109179084174 -0.0376120692964267  0.0181504978689960 
                  7                   7                   7                   7 
 0.0208109028608321   0.129704969588931   0.197693713890356    0.21834978504526 
                  7                   7                   7                   7 
  0.227443614453709   0.269153880224658    2.11810584406015 
                  7                   7                   7
Linear mixed model fit by REML 
Formula: LogY ~ WYear + (1 + WYear | Site) + (1 | Z29) 
   Data: testsamp 
 Subset: WYear <= 10 
    AIC    BIC logLik deviance REMLdev
 -147.5 -131.1  80.77   -167.7  -161.5
Random effects:
 Groups   Name        Variance  Std.Dev. Corr  
 Z29      (Intercept) 0.0095237 0.097589       
 Site     (Intercept) 0.0765445 0.276667       
          WYear       0.0262909 0.162145 0.046 
 Residual             0.0011282 0.033589       
Number of obs: 77, groups: Z29, 11; Site, 7

Fixed effects:
              Estimate Std. Error t value
(Intercept)  0.9857992  0.1183912   8.327
WYear       -0.0002676  0.0619989  -0.004

Correlation of Fixed Effects:
      (Intr)
WYear -0.020
user  system elapsed 
 26.245   0.188  26.512