Variable selection for varying dispersion beta glmm using glmmTMB package - R-SIG-mixed-models

Wed, Jun 2, 2021 3:33 PM #

Hi all,

I am struggling to interpret the residual plots from the Dharma package. If
we find a red line in residual plot,does it mean there is
heteroscedasticity in the model for the predictor variables? If the solid
line matches with the dashed line, can we say there is no
heteroscedasticity? I have attached three residual plots here to understand
heteroscedasticity of the model.  In the first plot, quantile deviationare
detected by the red line, so there is heteroscedasticity in the model. This
is for the model which includes all covariates. Then I created the residual
plot for one by one covariate to know which predictors are responsible for
variable dispersion. The 2nd and 3rd plots are for just one predictor. In
the 2nd plot, three solid lines are red and there exhibits a clear
deviation from the dashed line. So, there is heteroscedasticity in the
model for that predictor. The 3rd plot is box plot.The distribution for
each factor level should be uniformly distributed, so the box should go
from 0.25 to 0.75, with the median line at 0.5 (within-group ). As the two
box plots are red and it shows deviation of median line from 0.5, so there
is heteroscedasticity in the model for the predictor. The 4th plot shows
less deviation. Can we say this is better? I need your expert suggestions
and also please refer me to any article where I find a clear explanation of
heteroscedasticity checking by residual plot using DHARMA.Many thanks.

Kindest regards,

Tahsin

On Tue, Jun 1, 2021 at 4:14 PM Tahsin Ferdous <tahsinferdousuofc at gmail.com>
wrote:

Thanks John.

On Tue, Jun 1, 2021 at 3:11 PM John Maindonald <john.maindonald at anu.edu.au>
wrote:

No, I was not suggesting that.  I?d stick with the checks done
using simulateResiduals() and plotResiduals() from DHARMa.
The parameter `form` allows you to specify an explanatory
variable against whose values you can plot the simulated
residuals.

John Maindonald             email: john.maindonald at anu.edu.a
<john.maindonald at anu.edu.a>


On 2/06/2021, at 05:07, Tahsin Ferdous <tahsinferdousuofc at gmail.com>
wrote:

Hi John,

Thanks for your clarification. Are you suggesting doing the Breusch-Pagan
Test without the random effects for glmm?

Best,

Tahsin

On Fri, May 28, 2021 at 4:13 PM John Maindonald <
john.maindonald at anu.edu.au> wrote:

The Breusch-Pagan Test, as implemented in lmtest, is designed for
lm models with independent normal errors.   You have a random
effects term ? surely that invalidates use of this test.  Additionally,
I doubt that a normal distribution is a good enough approximation
to beta that, even without the random effects term, results from
lmtest() are valid.

John Maindonald             email: john.maindonald at anu.edu.au
<john.maindonald at anu.edu.au>

On 27/05/2021, at 13:01, Tahsin Ferdous <tahsinferdousuofc at gmail.com>
wrote:

I am struggling with the varying dispersion beta regression using
glmmTMB.
I did the Breusch-Pagan Test for checking heteroscedasticity for my
model.
As, the p-value is smaller than 0.05, so heterodasticity is present. So,
I
have to use beta glmm for varying dispersion. Further, I need to know
which
variable I should include for a varying dispersion model. To know this, I
followed a procedure. For example, my response variable is y, independent
variable is x1,x2 and x3 and there is random effect for study id. At
first,
I ran beta glmm for varying dispersion only for y and x1. Then, I did the
Breusch-Pagan Test for checking heteroscedasticity. If the p value is
smaller than 0.05, there is heteroscadsticity. In this case, I added x1
variable in my dispersion model. Similarly, I run beta glmm for y and x2,
and then perform the Breusch-Pagan test. If the result shows
homoscedasticity, then I didn't include x2 covariate for the dispersion
model. Again, I did the same thing for y and x3. If the result implies
heteroscedasticity, then I added x3 covariate for my dispersion model.

Finally, this will be like :
m1.f <- glmmTMB(y~ x1+x2+x3+(1|study_id), data=mydata, ziformula=
~1,dispformula = ~x1+x3, family=beta_family() )
summary(m1.f)

Is my procedure correct?

Should we comment on only conditional mean model?

Thanks.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot1.png
Type: image/png
Size: 98722 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20210602/7c4b478f/attachment-0004.png>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot 2.png
Type: image/png
Size: 142547 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20210602/7c4b478f/attachment-0005.png>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot 3.png
Type: image/png
Size: 45874 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20210602/7c4b478f/attachment-0006.png>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot 4.png
Type: image/png
Size: 85031 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20210602/7c4b478f/attachment-0007.png>

John Maindonald

Wed, Jun 2, 2021 7:09 PM #

Look first in the help pages (?DHARMa etc) and vignettes for
the DHARMa package.  After that, I am not sure what to suggest.
Others may have suggestions.

You will be lucky to get a perfect fit.  At the end of the day, the
question is whether such differences as are apparent matter,
for the purpose for which you intend to use the model.  A useful
tack is to simulate from the fitted model, fit to that model, and
check what difference it makes for the purpose for which the
model is used.  If there is little difference, the deviations from
the model probably do not much matter.  Maybe, repeat several
times.

Maybe you need to include degree 2 term(s) in your dispformula.
Try, maybe, a degree 2 normal spline (this may give less wiggle
at the extremes, and more flexibility of shape in the midrange
region) or a degree 2 or even 3 orthogonal polynomial [use poly()].


John Maindonald             email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>

On 3/06/2021, at 10:33, Tahsin Ferdous <tahsinferdousuofc at gmail.com<mailto:tahsinferdousuofc at gmail.com>> wrote:

Hi all,

I am struggling to interpret the residual plots from the Dharma package. If we find a red line in residual plot,does it mean there is heteroscedasticity in the model for the predictor variables? If the solid line matches with the dashed line, can we say there is no heteroscedasticity? I have attached three residual plots here to understand heteroscedasticity of the model.  In the first plot, quantile deviationare detected by the red line, so there is heteroscedasticity in the model. This is for the model which includes all covariates. Then I created the residual plot for one by one covariate to know which predictors are responsible for variable dispersion. The 2nd and 3rd plots are for just one predictor. In the 2nd plot, three solid lines are red and there exhibits a clear deviation from the dashed line. So, there is heteroscedasticity in the model for that predictor. The 3rd plot is box plot.The distribution for each factor level should be uniformly distributed, so the box should go from 0.25 to 0.75, with the median line at 0.5 (within-group ). As the two box plots are red and it shows deviation of median line from 0.5, so there is heteroscedasticity in the model for the predictor. The 4th plot shows less deviation. Can we say this is better? I need your expert suggestions and also please refer me to any article where I find a clear explanation of heteroscedasticity checking by residual plot using DHARMA.Many thanks.

Kindest regards,

Tahsin

On Tue, Jun 1, 2021 at 4:14 PM Tahsin Ferdous <tahsinferdousuofc at gmail.com<mailto:tahsinferdousuofc at gmail.com>> wrote:

Thanks John.

On Tue, Jun 1, 2021 at 3:11 PM John Maindonald <john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>> wrote:

No, I was not suggesting that.  I?d stick with the checks done
using simulateResiduals() and plotResiduals() from DHARMa.
The parameter `form` allows you to specify an explanatory
variable against whose values you can plot the simulated
residuals.
John Maindonald             email: john.maindonald at anu.edu.a<mailto:john.maindonald at anu.edu.a>

On 2/06/2021, at 05:07, Tahsin Ferdous <tahsinferdousuofc at gmail.com<mailto:tahsinferdousuofc at gmail.com>> wrote:

Hi John,

Thanks for your clarification. Are you suggesting doing the Breusch-Pagan Test without the random effects for glmm?

Best,

Tahsin

On Fri, May 28, 2021 at 4:13 PM John Maindonald <john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>> wrote:

The Breusch-Pagan Test, as implemented in lmtest, is designed for
lm models with independent normal errors.   You have a random
effects term ? surely that invalidates use of this test.  Additionally,
I doubt that a normal distribution is a good enough approximation
to beta that, even without the random effects term, results from
lmtest() are valid.

John Maindonald             email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>

On 27/05/2021, at 13:01, Tahsin Ferdous <tahsinferdousuofc at gmail.com<mailto:tahsinferdousuofc at gmail.com>> wrote:

I am struggling with the varying dispersion beta regression using glmmTMB.
I did the Breusch-Pagan Test for checking heteroscedasticity for my model.
As, the p-value is smaller than 0.05, so heterodasticity is present. So, I
have to use beta glmm for varying dispersion. Further, I need to know which
variable I should include for a varying dispersion model. To know this, I
followed a procedure. For example, my response variable is y, independent
variable is x1,x2 and x3 and there is random effect for study id. At first,
I ran beta glmm for varying dispersion only for y and x1. Then, I did the
Breusch-Pagan Test for checking heteroscedasticity. If the p value is
smaller than 0.05, there is heteroscadsticity. In this case, I added x1
variable in my dispersion model. Similarly, I run beta glmm for y and x2,
and then perform the Breusch-Pagan test. If the result shows
homoscedasticity, then I didn't include x2 covariate for the dispersion
model. Again, I did the same thing for y and x3. If the result implies
heteroscedasticity, then I added x3 covariate for my dispersion model.

Finally, this will be like :
m1.f <- glmmTMB(y~ x1+x2+x3+(1|study_id), data=mydata, ziformula=
~1,dispformula = ~x1+x3, family=beta_family() )
summary(m1.f)

Is my procedure correct?

Should we comment on only conditional mean model?

Thanks.


_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


<Rplot1.png><Rplot 2.png><Rplot 3.png><Rplot 4.png>

Tahsin Ferdous

Thu, Jun 3, 2021 6:59 PM #

Thanks a lot John for your valuable suggestions.

Kindest regards,

Tahsin

On Wed, Jun 2, 2021 at 8:09 PM John Maindonald <john.maindonald at anu.edu.au>
wrote:

Look first in the help pages (?DHARMa etc) and vignettes for
the DHARMa package.  After that, I am not sure what to suggest.
Others may have suggestions.

You will be lucky to get a perfect fit.  At the end of the day, the
question is whether such differences as are apparent matter,
for the purpose for which you intend to use the model.  A useful
tack is to simulate from the fitted model, fit to that model, and
check what difference it makes for the purpose for which the
model is used.  If there is little difference, the deviations from
the model probably do not much matter.  Maybe, repeat several
times.

Maybe you need to include degree 2 term(s) in your dispformula.
Try, maybe, a degree 2 normal spline (this may give less wiggle
at the extremes, and more flexibility of shape in the midrange
region) or a degree 2 or even 3 orthogonal polynomial [use poly()].

John Maindonald             email: john.maindonald at anu.edu.au
<john.maindonald at anu.edu.au>


On 3/06/2021, at 10:33, Tahsin Ferdous <tahsinferdousuofc at gmail.com>
wrote:

Hi all,

I am struggling to interpret the residual plots from the Dharma package.
If we find a red line in residual plot,does it mean there is
heteroscedasticity in the model for the predictor variables? If the solid
line matches with the dashed line, can we say there is no
heteroscedasticity? I have attached three residual plots here to understand
heteroscedasticity of the model.  In the first plot, quantile deviationare
detected by the red line, so there is heteroscedasticity in the model. This
is for the model which includes all covariates. Then I created the residual
plot for one by one covariate to know which predictors are responsible for
variable dispersion. The 2nd and 3rd plots are for just one predictor. In
the 2nd plot, three solid lines are red and there exhibits a clear
deviation from the dashed line. So, there is heteroscedasticity in the
model for that predictor. The 3rd plot is box plot.The distribution for
each factor level should be uniformly distributed, so the box should go
from 0.25 to 0.75, with the median line at 0.5 (within-group ). As the two
box plots are red and it shows deviation of median line from 0.5, so there
is heteroscedasticity in the model for the predictor. The 4th plot shows
less deviation. Can we say this is better? I need your expert suggestions
and also please refer me to any article where I find a clear explanation of
heteroscedasticity checking by residual plot using DHARMA.Many thanks.

Kindest regards,

Tahsin

On Tue, Jun 1, 2021 at 4:14 PM Tahsin Ferdous <tahsinferdousuofc at gmail.com>
wrote:

Thanks John.

On Tue, Jun 1, 2021 at 3:11 PM John Maindonald <
john.maindonald at anu.edu.au> wrote:

No, I was not suggesting that.  I?d stick with the checks done
using simulateResiduals() and plotResiduals() from DHARMa.
The parameter `form` allows you to specify an explanatory
variable against whose values you can plot the simulated
residuals.
John Maindonald             email: john.maindonald at anu.edu.a
<john.maindonald at anu.edu.a>

On 2/06/2021, at 05:07, Tahsin Ferdous <tahsinferdousuofc at gmail.com>
wrote:

Hi John,

Thanks for your clarification. Are you suggesting doing the
Breusch-Pagan Test without the random effects for glmm?

Best,

Tahsin

On Fri, May 28, 2021 at 4:13 PM John Maindonald <
john.maindonald at anu.edu.au> wrote:

The Breusch-Pagan Test, as implemented in lmtest, is designed for
lm models with independent normal errors.   You have a random
effects term ? surely that invalidates use of this test.  Additionally,
I doubt that a normal distribution is a good enough approximation
to beta that, even without the random effects term, results from
lmtest() are valid.

John Maindonald             email: john.maindonald at anu.edu.au
<john.maindonald at anu.edu.au>

On 27/05/2021, at 13:01, Tahsin Ferdous <tahsinferdousuofc at gmail.com>
wrote:

I am struggling with the varying dispersion beta regression using
glmmTMB.
I did the Breusch-Pagan Test for checking heteroscedasticity for my
model.
As, the p-value is smaller than 0.05, so heterodasticity is present.
So, I
have to use beta glmm for varying dispersion. Further, I need to know
which
variable I should include for a varying dispersion model. To know this,
I
followed a procedure. For example, my response variable is y,
independent
variable is x1,x2 and x3 and there is random effect for study id. At
first,
I ran beta glmm for varying dispersion only for y and x1. Then, I did
the
Breusch-Pagan Test for checking heteroscedasticity. If the p value is
smaller than 0.05, there is heteroscadsticity. In this case, I added x1
variable in my dispersion model. Similarly, I run beta glmm for y and
x2,
and then perform the Breusch-Pagan test. If the result shows
homoscedasticity, then I didn't include x2 covariate for the dispersion
model. Again, I did the same thing for y and x3. If the result implies
heteroscedasticity, then I added x3 covariate for my dispersion model.

Finally, this will be like :
m1.f <- glmmTMB(y~ x1+x2+x3+(1|study_id), data=mydata, ziformula=
~1,dispformula = ~x1+x3, family=beta_family() )
summary(m1.f)

Is my procedure correct?

Should we comment on only conditional mean model?

Thanks.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models