Dear all,
I?m new to this mailing list and really hope that somebody here can help me with the following issue:
I calculated the following linear models on a BoxCox transformed response variable with 382 data points:
Model 1: Y~x+a+b+c+d+e+(a*b)+(a*c)+ (a*d)+?+(a*b*c)+(a*b*d)+(a*b*e)+?
a: 'Experimental Temperature' (Temp1, Temp2)
b: 'Host Population' (PopX, PopY)
c: 'Parasite Population' (PopX, PopY)
d: 'Host Gender' (male, female)
Additionally, I included the continuous predictor variable 'Parasite Weight' (e) and all possible 2-way (10 interactions) and 3-way (10 interactions) interactions into the model.
In model 2 I replaced the two main effects 'Host Population' and 'Parasite Population' with one variable ('Sympatry/Allopatry') that combines the two effects. Apart from this, model 2 (six 2-way interactions and four 3-way interactions) was identical to model 1.
I am interested now in all interactions that include the continuous predictor variable 'Parasite Weight'. I got such a significant interaction ('Experimental Temperature x Parasite Population x Parasite Weight', p = 0.010) from model 1.
We sent a manuscript containing these two models to a journal for review and got it back now with a comment from a reviewer who suggested that we look for non-linear relationships involving 'Parasite Weight'.
Thus, I calculated model 1.2 which corresponds to model 1 but additionally added the quadratic term of 'Parasite Weight' ('Parasite Weight^2') and the respective interactions (in total 14 x 2-way interactions and 16 x 3-way interactions). I did the same for model 2, which resulted in model 2.2 with nine 2-way interactions and seven 3-way interactions.
The significant interaction I found with model 1 was not significant anymore with model 1.2 and in model 2.2 two interactions became significant ('Host Gender x Sympatry/Allopatry x Parasite Weight', p = 0.038 and 'Host Gender x Sympatry/Allopatry x Parasite Weight^2', p = 0.044) that were not significant in model 2.
Here are my questions:
1. Why is it that including the quadratic term removes some significant effects while adding others?
2. What does it mean when both an interaction including the linear term and the same interaction including the quadratic term become significant? Does this suggest a non-linear relationship or both a linear and a non-linear relationship?
3. Could it be that the disappearance of the interaction that was significant in model 1, is caused by an over-parameterization of model 1.2 and how can I prove this (with all the models we have the potential problem of many interactions and main effects)?
4. Are there any general arguments for when to include a quadratic term into a model and when quadratic terms should be avoided?
5. Which model can I trust?
Thank you very much in advance for any advice you can give me,
Fred.
Quadratic term in linear model and model over-parameterization
3 messages · f_fran03 at uni-muenster.de, Thierry Onkelinx, Ulf Köther
Dear Fred, This looks like a linear model and not a linear mixed model. This mailing list is dedicated to mixed models. I strongly recommend to find a local statistician. All your models seem way to complex given the data. As a rule of thumb you need at least 10 observations for each parameter in your model. Adding the quadratic terms increases the number of parameters and only make the problem worse. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2017-03-15 10:46 GMT+01:00 <f_fran03 at uni-muenster.de>:
Dear all,
I?m new to this mailing list and really hope that somebody here can help me with the following issue:
I calculated the following linear models on a BoxCox transformed response variable with 382 data points:
Model 1: Y~x+a+b+c+d+e+(a*b)+(a*c)+ (a*d)+?+(a*b*c)+(a*b*d)+(a*b*e)+?
a: 'Experimental Temperature' (Temp1, Temp2)
b: 'Host Population' (PopX, PopY)
c: 'Parasite Population' (PopX, PopY)
d: 'Host Gender' (male, female)
Additionally, I included the continuous predictor variable 'Parasite Weight' (e) and all possible 2-way (10 interactions) and 3-way (10 interactions) interactions into the model.
In model 2 I replaced the two main effects 'Host Population' and 'Parasite Population' with one variable ('Sympatry/Allopatry') that combines the two effects. Apart from this, model 2 (six 2-way interactions and four 3-way interactions) was identical to model 1.
I am interested now in all interactions that include the continuous predictor variable 'Parasite Weight'. I got such a significant interaction ('Experimental Temperature x Parasite Population x Parasite Weight', p = 0.010) from model 1.
We sent a manuscript containing these two models to a journal for review and got it back now with a comment from a reviewer who suggested that we look for non-linear relationships involving 'Parasite Weight'.
Thus, I calculated model 1.2 which corresponds to model 1 but additionally added the quadratic term of 'Parasite Weight' ('Parasite Weight^2') and the respective interactions (in total 14 x 2-way interactions and 16 x 3-way interactions). I did the same for model 2, which resulted in model 2.2 with nine 2-way interactions and seven 3-way interactions.
The significant interaction I found with model 1 was not significant anymore with model 1.2 and in model 2.2 two interactions became significant ('Host Gender x Sympatry/Allopatry x Parasite Weight', p = 0.038 and 'Host Gender x Sympatry/Allopatry x Parasite Weight^2', p = 0.044) that were not significant in model 2.
Here are my questions:
1. Why is it that including the quadratic term removes some significant effects while adding others?
2. What does it mean when both an interaction including the linear term and the same interaction including the quadratic term become significant? Does this suggest a non-linear relationship or both a linear and a non-linear relationship?
3. Could it be that the disappearance of the interaction that was significant in model 1, is caused by an over-parameterization of model 1.2 and how can I prove this (with all the models we have the potential problem of many interactions and main effects)?
4. Are there any general arguments for when to include a quadratic term into a model and when quadratic terms should be avoided?
5. Which model can I trust?
Thank you very much in advance for any advice you can give me,
Fred.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Fred, I have to say... wow! Really, you got *only* the comment about adding a quadratic effect to the model?!? The review process itself seems to be much more ill-conditioned these days than I thought... First, this is the mailing list for mixed effects models, i.e., multilevel models. Your question seems to be on "normal" linear models (which are a special case of the former without any random effects) and you offered no hint on any pseudoreplication in your data, i.e., no random effects that you fitted. Are all of your data points really independent? If so, this might not be the right mailing list to ask your question. Second, to give you some hints about where to start your modelling endeavour: You need about 10-15 data points (rule of thumb) to reliably estimate one parameter, so you have about 382 / 15 = 25 possible parameters you can estimate. Your models are overfitted! And the p-values you are getting are totally nonsensical in my opinion (besides the discussion about the sense of p-values at all). So, regarding your question 5: NONE! You should start at reading Frank Harrell's 2015 book "Regression Modeling Strategies (2nd-Ed)" to give yourself a better foundation about linear models. You really have to begin with the basics of what you are doing... And in that book you'll also find answers to all the other question you asked. Sorry for being a bit harsh here, but I do not know another way of telling you this. Good luck! Am 15.03.2017 um 10:46 schrieb f_fran03 at uni-muenster.de:
Dear all,
I?m new to this mailing list and really hope that somebody here can help me with the following issue:
I calculated the following linear models on a BoxCox transformed response variable with 382 data points:
Model 1: Y~x+a+b+c+d+e+(a*b)+(a*c)+ (a*d)+?+(a*b*c)+(a*b*d)+(a*b*e)+?
a: 'Experimental Temperature' (Temp1, Temp2)
b: 'Host Population' (PopX, PopY)
c: 'Parasite Population' (PopX, PopY)
d: 'Host Gender' (male, female)
Additionally, I included the continuous predictor variable 'Parasite Weight' (e) and all possible 2-way (10 interactions) and 3-way (10 interactions) interactions into the model.
In model 2 I replaced the two main effects 'Host Population' and 'Parasite Population' with one variable ('Sympatry/Allopatry') that combines the two effects. Apart from this, model 2 (six 2-way interactions and four 3-way interactions) was identical to model 1.
I am interested now in all interactions that include the continuous predictor variable 'Parasite Weight'. I got such a significant interaction ('Experimental Temperature x Parasite Population x Parasite Weight', p = 0.010) from model 1.
We sent a manuscript containing these two models to a journal for review and got it back now with a comment from a reviewer who suggested that we look for non-linear relationships involving 'Parasite Weight'.
Thus, I calculated model 1.2 which corresponds to model 1 but additionally added the quadratic term of 'Parasite Weight' ('Parasite Weight^2') and the respective interactions (in total 14 x 2-way interactions and 16 x 3-way interactions). I did the same for model 2, which resulted in model 2.2 with nine 2-way interactions and seven 3-way interactions.
The significant interaction I found with model 1 was not significant anymore with model 1.2 and in model 2.2 two interactions became significant ('Host Gender x Sympatry/Allopatry x Parasite Weight', p = 0.038 and 'Host Gender x Sympatry/Allopatry x Parasite Weight^2', p = 0.044) that were not significant in model 2.
Here are my questions:
1. Why is it that including the quadratic term removes some significant effects while adding others?
2. What does it mean when both an interaction including the linear term and the same interaction including the quadratic term become significant? Does this suggest a non-linear relationship or both a linear and a non-linear relationship?
3. Could it be that the disappearance of the interaction that was significant in model 1, is caused by an over-parameterization of model 1.2 and how can I prove this (with all the models we have the potential problem of many interactions and main effects)?
4. Are there any general arguments for when to include a quadratic term into a model and when quadratic terms should be avoided?
5. Which model can I trust?
Thank you very much in advance for any advice you can give me,
Fred.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Rainer Schoppik _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING