Whether or not you're interested in a parameter has little to no influence on whether or not it should be in your model nor whether or not it is a fixed or random effects. Instead, that is determined by the structure of your data. In your case, it would in many ways be nicer to have years as a random effect (as you only care about the *variance* due to year and not any particular fixed year), but you simply don't have the data structure to do so with so few levels. In your case, the year parameter is simply a nuisance parameter -- it captures important variation, even if it's variation you don't care about. Now you can marginalize over that effect either before modelling (e.g. by aggregating / averaging over the time dimension) or after modelling (your choice of test type (II, III) or using predicted marginal / least-square means (e.g. via package lsmeans) or implicitly (just completely omitting it from the model specification). I would tend to leave the year effect in the model if possible -- I'm a big fan of modelling as much as you can so that your model is as good as possible, even if there are only a few predictors you actually care about.? That said, if you don't have enough data for such a complex model, then you should leave out less-interesting dimensions in your model specification (and potentially aggregate across them beforehand). The Bates et al preprint on parsimonious mixed models really emphasizes this point -- it doesn't do much good to model everything and have a degenerate and/or overfitted model. You know your data best and so you know what tradeoffs are best for your case. Best, Phillip
On Fri, 2016-11-04 at 09:51 -0600, Joseph Aidoo wrote:
Hi Phillip, I am going to bug you one last time.? I'm considering focusing on the effect of Variety and Nitrogen only on yield.? I am not interested in the effect of the ?years. How do I account for variation in years in the model now. Perhaps using at a random factor?? How would you lay it out in a model.? Regards? Sent from my iPhone On Nov 4, 2016, at 7:17 AM, Phillip Alday <Phillip.Alday at unisa.edu.au
wrote:
(Try to remember to keep the list in CC -- I'm really bad at it, too!) Looking at your QQ plot (consider posting it somewhere online for posteriority as the list strips attachments), it seems that the deviation from normality occurs in the form of heavy tails of the sort you would find with a t-distribution. If you were doing Bayesian modelling, I would suggest that you just use a t-likelihood instead of a normal one .... but that may not even be necessary here. (For one thing, it's not the data that have to be normally distributed, but rather the residuals or equivalently, the reponse *conditioned* on the predictors. The linear model is after all given by Y = B*X + B0 + e, with e ~ N(0,sigma). And in the mixed-model case, you have additional normal distributions mixing in there via the random effects.) I would check your model fit by checking e.g.? ?- fitted vs. actual (you can fake this by using faceting in ggplot or the | operator in lattice, or you can do it for real using the predict() function)? ?- fitted vs. residuals ( plot(lme.model) does this for you ) The former will tell you whether your model accurately represent your data (and places where it fails to do so), while the latter will give you a visual impression about how bad the violation of normality of the residuals is. More important than fulfilling distributional assumptions for me is seeing how well the model actually fits the data and how good it is at predicting new data. (Techniques like cross-validation or posterior predictive check in the Bayesian framework are based on this idea as well.) Violating the testing assumptions does mean that not all the ?frequentist guarantees hold, but if your model does a good job at describing old and predicitng new data in practice, then that may be enough for many applications.? Now, if you're focussed on significance tests or traditional confidence intervals, violations of model assumptions can ruin your day, as you can no longer trust that the null distribution is correct. However, you can still use bootstrapped confidence intervals, altough those take much longer to compute. One final thing: I see you're using Type-III tests. Even though it's the default in certain popular commercial statistical packages, I would encourage you to think about twice about doing so. Read Venables' Exegeses on Linear Models and see some of the stuff John Fox (author of the car packages) has written on this topic (e.g.?https://stat.ethz .ch/ pipermail/r-help/2006-August/111927.html).?A lot of ink has been spilled on this topic and it's not always black and white, but I generally find that Type-II tests are actually the thing that I want. Best, Phillip On Fri, 2016-11-04 at 06:16 -0600, Joseph Aidoo wrote:
Yield data needs to be transformed... Boxcox doesnt seem to be
working for me. Here is the output for
FAT.lme<- lmer(Yield.kg_ha~N.rate*Variety*Year+(Year|Site:Block),data=Jo_data14 15.dat) Anova(FAT.lme,type=3,test="F") Response: Yield.kg_ha ? ? ? ? ? ? ? ? ? ? ? ? ? ?F Df Df.res ? ?Pr(>F) ? ? (Intercept) ? ? ? ? 526.2004 ?1 ?49.26 < 2.2e-16 *** N.rate ? ? ? ? ? ? ? 11.5523 ?3 393.00 2.845e-07 *** Variety ? ? ? ? ? ? ? 7.7304 ?4 393.70 5.270e-06 *** Year ? ? ? ? ? ? ? ? 34.7598 ?2 ?22.02 1.537e-07 *** N.rate:Variety ? ? ? ?1.9513 12 393.54 ? 0.02746 * ? N.rate:Year ? ? ? ? ? 2.4641 ?6 393.30 ? 0.02373 * ? Variety:Year ? ? ? ? ?1.8262 ?8 393.25 ? 0.07068 . ? N.rate:Variety:Year ? 0.8795 24 393.25 ? 0.63104 ? ? --- Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
?
Shapiro-Wilk normality test data: ?resslme W = 0.98449, p-value = 5.986e-05 On Thu, Nov 3, 2016 at 11:38 PM, Phillip Alday <Phillip.Alday at unisa.e du.au> wrote:
One possible model would be:
?Yield ~ nitrogen*variety*Year + (Year|Site:block)
You won't explicitly model the variation between sites, but you
should still capture at least some of it via the Site:block
interaction. (If your blocks are labelled uniquely across sites,
then you can just use block instead of Site:block). You have a
total of 8 levels for blocks, which still isn't a lot but should
work.
You could also include a fixed effect for Site:
?Yield ~ nitrogen*variety*Year*Site + (Year|Site:block)
but this will add a fair amount of complexity to the model and I'm
not sure you have enough data for that.
I'm not familiar with oat yields, so I don't know if you should
transform Yield, the example in the oats dataset (?oats in R) don't
transform Yield.
Best,
Phillip
On 4 Nov 2016, at 15:58, Joseph Aidoo <jaidoo at ualberta.ca> wrote:
Thank you Philip? for your quick response. I am treating Year as
a factor.
? I am not interested in the year effect. However I wish to
account for it.
At each site we had four blocks for the experiment so I assumed
blocks will be in sites.
Given this clarification how would have out my model?
I have read some papers which combines site*years = environment.?
I don't know if its suitable in this experiment.
Thank you
On Thu, Nov 3, 2016 at 11:06 PM, Phillip Alday <Phillip.Alday at uni
sa.edu.au> wrote:
Hi Joseph,
Are you treating year as a continuous variable or a categorical?
If you only have three years and one is outlier-isn, it may not
make sense to treat them as a continuous variable (whether linear,
quadratic or some other form).
Also, 2 sites is not enough levels for a random effect --
remember that RE are variance estimates and it doesn't really make
sense to make a variance estimate of two levels. How many blocks do
you have per site? Maybe Site:block is sufficient and if need be
you can include Site as a nuisance parameter in the fixed effects?
Best,
Phillip
On 3 Nov 2016, at 05:07, Joseph Aidoo <jaidoo at ualberta.ca>
wrote:
My experiment was set up as a RCBD with
5 oat varieties
4 nitrogen rates
at 2 sites
over 3 years .
I am investigating the effect of Variety and Nitrogen on Yield.
In the second year of the experiment we had a drought so the
data was
awful. So i expect differences in Years
I am using Variety, Nitrogen and Year as my fixed effects.
and Sites as my random effect.
I expect and interaction between Variety X Nitrogen.
This is the model i came up with
Yield ~ nitrogen*variety*Year + (Year|Site/block)
My data doesnt seem normal because of the odd year with the
drought.
I have tried to normally transform the data but nothing seems
to work.
I need advice on what to do next? Is it ideal to use a glmm?
What could the
best model be?
Thanks
? ? ? ?[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org mailing list