is multicollinearity of fixed effects resolved by random effects
It is easy to incorporate phylogenetic correlations among species
("taxonomic factors"), using the ape package and lme. This is far better
than combining species into families or genera, as taxonomic heirarchies
are subjective, artificial, and rarely represent the true phylogeny. I
strongly disagree that taxonomic factors necessarily function as fixed
effects. The phylogeny represents what we think we know about the
covariance among species in all traits, measured or unmeasured. I can't
see how unmeasured traits or poorly-defined "taxonomic factors" can
possibly be included as fixed effects. If particular measured traits are
thought to be important in determining the mean response, they should be
included as fixed effects, but phylogenies are used to model the
covariance, not the mean. Unusual species should stand out by
examination of the normalized residuals.
My approach would be to ditch species as a factor altogether, and
incorporate phylogenetic effects through the correlation argument in lme
(or gls if there are no other random effects). This assumes there is an
available phylogeny for the fungi, which may not be true.
There is a very large literature on incorporating phylogeny into
analyses (to which I am afraid I am a small contributor).
Simon.
On Mon, 2008-05-19 at 10:02 +1000, John Maindonald wrote:
I think grouping them into families is a very good idea. This makes use of prior insight, reduces the number of parameters to an extent that it becomes more reasonable to think about fixed effects, and you can look for individual species that stray from the path laid out for them by their families. For the fixed effects analyses that I suggested, you might do these by families. Conceptually, there are taxonomic factors that surely function, no doubt in interaction with location variables, as fixed effects. I consider that one ought to start by thinking of them as fixed effects, unless it can be demonstrated that data are indistinguishable from random variation. Maybe however those effects operate more as the level of genera or families than at the level of species. Responding to this point may be a useful aim for the study. John. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 19 May 2008, at 4:04 AM, Jordan Mayor wrote:
Thank you JM, RK, and DB for your thoughtful responses to my questions. I am working on posting the file on the web and will respond as soon as it is publicly available - (I could email it direct to those concerned as well). Until then here is the output requested by DB and RK as well as more information regarding sites, species, and the data:
ds=d[d$TYPE=="s",] #separates into just sap fungi as used in the
example below
dm=d[d$TYPE=="m",] #separates into just myc fungi (treated as
distinct populations in my analyses)
table(table(ds$SPECIES))
0 1 2 3 5 6
393 161 45 7 1 1
# As seen all species are not present at all the sites - this is
partly why I thought to nest SPECIES within SITE. The sites range
widely from boreal tundra, to temperate California, to tropical
forests of Borneo and Guyana. Most fungi are not so cosmopolitan as
to even potentially be present everywhere - let alone be collected
during brief collecting forays. One site (pine forest in CA) didn't
even list fungal names. All fungi were listed as mycorrhizal (m),
saprotrophic (s), or of unknown ecological roles (unk). Most of the
genera (or families) are present at all sites but that biologically
coarseness concerns me. If, as pointed out by RK the ideal
situation would be every species present at every site then perhaps
creating a new column containing families or genera will move the
dataset toward that direction.
xtabs(~SITE,ds)
SITE
Aheden
Aheden 2
4 0
Ashiu (temp deciduous broadleaf) Betsele
21 0
Breuil, France Chiba
14 6
d
Flakaliden
23 0
Glacier Bay, Alaska Guyana
4
20
h heath
tundra, subarctic Sweden
38 4
Kagoshima
Kulbacksliden
1 0
Kyoto Lamar
Haines
26
13
Lambir (lowland tropical) Miyajima
14 1
Norikura Norrliden
3 0
Okinawa Ontake (subalpine
coniferous)
0 8
Oodai pine forests in
CA
2
25
Shirahama nowbowl
2
13
Spruce plantation Stadsskogen
17
13
Svartberget Tanigawa
0 6
tussock tundra near Toolik Lake, AK Vilan
5 0
Woods Creek
25 # the zeros
here are because some sites have no X13C data - only X15N
I did try running models with centered predictors, as suggested by
RK - see below. I found that the AIC scores were much larger
however when the predictors were centered using my method. I
centered by taking the mean of mycorrhizal and saprotrophic fungal
groups at each site - then I subtracted each (m) or (s) fungus from
those same group means within each site. Because the (m) and (s)
fungi have unique sources of carbon, and nitrogen and these are
reflected in their isotope values (X13C, X15N), I deemed this
centering level to be appropriatein order to preserve the magnitude
of difference between the groups.
mix.model1 # Raw isotope values used
Linear mixed model fit by maximum likelihood
Formula: X13C ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
Data: ds
AIC BIC logLik deviance REMLdev
1016 1041 -500.9 1002 1031
Random effects:
Groups Name Variance Std.Dev.
SPECIES (Intercept) 0.62149 0.78835
SITE (Intercept) 0.34885 0.59063
Residual 1.28911 1.13539
Number of obs: 283, groups: SPECIES, 215; SITE, 24
Fixed effects:
Estimate Std. Error t value
(Intercept) -2.207e+01 1.121e+00 -19.695
MAT -6.418e-02 3.833e-02 -1.675
MAP 3.137e-05 2.100e-04 0.149
LAT -8.690e-03 1.778e-02 -0.489
Correlation of Fixed Effects:
(Intr) MAT MAP
MAT -0.707
MAP -0.401 -0.247
LAT -0.959 0.692 0.272
mix.model1a # Group centered within each site
Linear mixed model fit by maximum likelihood
Formula: STND_13c ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
Data: ds
AIC BIC logLik deviance REMLdev
2623 2649 -1305 2609 2620
Random effects:
Groups Name Variance Std.Dev.
SPECIES (Intercept) 301.9720 17.3773
SITE (Intercept) 5.1294 2.2648
Residual 327.3857 18.0938
Number of obs: 283, groups: SPECIES, 215; SITE, 24
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.463e+01 1.166e+01 2.9713
MAT 3.899e-01 4.330e-01 0.9006
MAP 2.543e-05 1.811e-03 0.0140
LAT 3.235e-01 1.867e-01 1.7324
Correlation of Fixed Effects:
(Intr) MAT MAP
MAT -0.779
MAP -0.230 -0.300
LAT -0.955 0.732 0.116
anova(mix.model1,mix.model1a)
Data: ds
Models:
mix.model1: X13C ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
mix.model1a: STND_13c ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
Df AIC BIC logLik
Chisq Chi Df Pr(>Chisq)
mix.model1.p 7 1015.85 1041.37 -500.93
mix.model1a.p 7 2623.49 2649.00 -1304.74 0 0 <
2.2e-16 ***
# As seen above, the model using my centered values performed
poorly. My interpretation is that the centering removed the very
variability associated with climate I am trying to predict!
#In addition, I compared the other models mentioned by RK using the
raw isotope values:
anova(mix.model1,mix.model2,mix.model3)
Data: ds
Models:
mix.model1: X13C ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
mix.model2: X13C ~ (MAT + MAP + LAT)^2 + (1 | SPECIES) + (1 | SITE)
mix.model3: X13C ~ MAT * MAP * LAT + (1 | SPECIES) + (1 | SITE)
Df AIC BIC
logLik Chisq Chi Df Pr(>Chisq)
mix.model1.p 7 1015.85 1041.37 -500.93
mix.model2.p 10 978.21 1014.66 -479.10 43.645
3 1.796e-09 ***
mix.model3.p 11 980.18 1020.28 -479.09 0.023
1 0.8794
# I have refrained from trying the models mentioned by DB in order
to protect his rights to not commit the "capital mistake" ;)
however, the three predictors (MAT, MAP, LAT) do indeed seem to
overplot - I will try to select only one perhaps in my final models.
# Again - thank you all for your help on this.
--
Jordan Mayor, Ph.D. Candidate
Ecosystem Dynamics Research Lab
Department of Botany, University of Florida
Gainesville, FL 32611
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician Faculty of Biological and Chemical Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320 Goddard Building (8) T: +61 7 3365 2506 http://www.uq.edu.au/~uqsblomb email: S.Blomberg1_at_uq.edu.au Policies: 1. I will NOT analyse your data for you. 2. Your deadline is your problem. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey.