Help with Linear Model
(Try to remember to keep the list in CC -- I'm really bad at it, too!) Looking at your QQ plot (consider posting it somewhere online for posteriority as the list strips attachments), it seems that the deviation from normality occurs in the form of heavy tails of the sort you would find with a t-distribution. If you were doing Bayesian modelling, I would suggest that you just use a t-likelihood instead of a normal one .... but that may not even be necessary here. (For one thing, it's not the data that have to be normally distributed, but rather the residuals or equivalently, the reponse *conditioned* on the predictors. The linear model is after all given by Y = B*X + B0 + e, with e ~ N(0,sigma). And in the mixed-model case, you have additional normal distributions mixing in there via the random effects.) I would check your model fit by checking e.g.? ?- fitted vs. actual (you can fake this by using faceting in ggplot or the | operator in lattice, or you can do it for real using the predict() function)? ?- fitted vs. residuals ( plot(lme.model) does this for you ) The former will tell you whether your model accurately represent your data (and places where it fails to do so), while the latter will give you a visual impression about how bad the violation of normality of the residuals is. More important than fulfilling distributional assumptions for me is seeing how well the model actually fits the data and how good it is at predicting new data. (Techniques like cross-validation or posterior predictive check in the Bayesian framework are based on this idea as well.) Violating the testing assumptions does mean that not all the ?frequentist guarantees hold, but if your model does a good job at describing old and predicitng new data in practice, then that may be enough for many applications.? Now, if you're focussed on significance tests or traditional confidence intervals, violations of model assumptions can ruin your day, as you can no longer trust that the null distribution is correct. However, you can still use bootstrapped confidence intervals, altough those take much longer to compute. One final thing: I see you're using Type-III tests. Even though it's the default in certain popular commercial statistical packages, I would encourage you to think about twice about doing so. Read Venables' Exegeses on Linear Models and see some of the stuff John Fox (author of the car packages) has written on this topic (e.g.?https://stat.ethz.ch/ pipermail/r-help/2006-August/111927.html).?A lot of ink has been spilled on this topic and it's not always black and white, but I generally find that Type-II tests are actually the thing that I want. Best, Phillip
On Fri, 2016-11-04 at 06:16 -0600, Joseph Aidoo wrote:
Yield data needs to be transformed... Boxcox doesnt seem to be working for me. Here is the output for FAT.lme<- lmer(Yield.kg_ha~N.rate*Variety*Year+(Year|Site:Block),data=Jo_data14 15.dat) Anova(FAT.lme,type=3,test="F") Response: Yield.kg_ha ? ? ? ? ? ? ? ? ? ? ? ? ? ?F Df Df.res ? ?Pr(>F) ? ? (Intercept) ? ? ? ? 526.2004 ?1 ?49.26 < 2.2e-16 *** N.rate ? ? ? ? ? ? ? 11.5523 ?3 393.00 2.845e-07 *** Variety ? ? ? ? ? ? ? 7.7304 ?4 393.70 5.270e-06 *** Year ? ? ? ? ? ? ? ? 34.7598 ?2 ?22.02 1.537e-07 *** N.rate:Variety ? ? ? ?1.9513 12 393.54 ? 0.02746 * ? N.rate:Year ? ? ? ? ? 2.4641 ?6 393.30 ? 0.02373 * ? Variety:Year ? ? ? ? ?1.8262 ?8 393.25 ? 0.07068 . ? N.rate:Variety:Year ? 0.8795 24 393.25 ? 0.63104 ? ? --- Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
?
Shapiro-Wilk normality test data: ?resslme W = 0.98449, p-value = 5.986e-05 On Thu, Nov 3, 2016 at 11:38 PM, Phillip Alday <Phillip.Alday at unisa.e du.au> wrote:
One possible model would be: ?Yield ~ nitrogen*variety*Year + (Year|Site:block) You won't explicitly model the variation between sites, but you should still capture at least some of it via the Site:block interaction. (If your blocks are labelled uniquely across sites, then you can just use block instead of Site:block). You have a total of 8 levels for blocks, which still isn't a lot but should work. You could also include a fixed effect for Site: ?Yield ~ nitrogen*variety*Year*Site + (Year|Site:block) but this will add a fair amount of complexity to the model and I'm not sure you have enough data for that. I'm not familiar with oat yields, so I don't know if you should transform Yield, the example in the oats dataset (?oats in R) don't transform Yield. Best, Phillip
On 4 Nov 2016, at 15:58, Joseph Aidoo <jaidoo at ualberta.ca> wrote: Thank you Philip? for your quick response. I am treating Year as
a factor.
? I am not interested in the year effect. However I wish to
account for it.
At each site we had four blocks for the experiment so I assumed
blocks will be in sites.
Given this clarification how would have out my model? I have read some papers which combines site*years = environment.?
I don't know if its suitable in this experiment.
Thank you On Thu, Nov 3, 2016 at 11:06 PM, Phillip Alday <Phillip.Alday at uni
sa.edu.au> wrote:
Hi Joseph, Are you treating year as a continuous variable or a categorical?
If you only have three years and one is outlier-isn, it may not make sense to treat them as a continuous variable (whether linear, quadratic or some other form).
Also, 2 sites is not enough levels for a random effect --
remember that RE are variance estimates and it doesn't really make sense to make a variance estimate of two levels. How many blocks do you have per site? Maybe Site:block is sufficient and if need be you can include Site as a nuisance parameter in the fixed effects?
Best, Phillip
On 3 Nov 2016, at 05:07, Joseph Aidoo <jaidoo at ualberta.ca>
wrote:
My experiment was set up as a RCBD with 5 oat varieties 4 nitrogen rates at 2 sites over 3 years . I am investigating the effect of Variety and Nitrogen on Yield. In the second year of the experiment we had a drought so the
data was
awful. So i expect differences in Years I am using Variety, Nitrogen and Year as my fixed effects. and Sites as my random effect. I expect and interaction between Variety X Nitrogen. This is the model i came up with Yield ~ nitrogen*variety*Year + (Year|Site/block) My data doesnt seem normal because of the odd year with the
drought.
I have tried to normally transform the data but nothing seems
to work.
I need advice on what to do next? Is it ideal to use a glmm?
What could the
best model be? Thanks ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models