AIC in R: back-transforming standardized model parameters (slopes)
Just to echo Bob O'Hara's comment and elaborate a bit more - Don't model average the regression coefficients, especially if you are considering models with and without interactions among the predictors. Follow the link provided by Bob to Cade (2015. Model averaging and muddled multimodel inferences. Ecology 96:2370-2382) to see why model-averaged regression coefficients as conventionally done following Burnham and Anderson provides meaningless estimates of partial effects in the presence of multicollinearity and addresses a concept that doesn't exist (model uncertainty in regression coefficients) when there is no multicollinearity. What you perhaps really need to think about is why you want standardized predictors. Some times they are useful and some times not. Here you seem to be going to a lot of trouble to standardize and then to get back to unstandardized estimates (Drew and Phil have provided good advice about recovering estimates in unstandardized scale) but without any indication of how standardization is aiding your interpretations. Note that in general, when standardizing regression coefficients it would be more consistent with the algebra of the regression coefficients to actually standardize them by their partial standard deviations that adjusts for the linear relationship (multicollinearity) among predictors (see the Cade 2015 paper to see why this suggestion originally by Bring 1994 works). This may be less relevant to your simple one continuous predictor (time) model with an interaction with two levels of a factor (air or ice), but it wasn't clear what other models you might be considering. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326
On Tue, Jan 12, 2016 at 8:16 AM, Bob O'Hara <bohara at senckenberg.de> wrote:
On 12/01/16 10:54, Matt Perkins wrote:
Hi Drew and R-help, Many thanks for your email, and your explanations and suggestions have been very helpful in aiding me to understand this problem. Should you, or anyone else on the helplist have time, perhaps I may elaborate on my problem with a little more detail. I appreciate the pragmatic suggestion to use a non-standardised model to extract ?real? slope values that I can report, which is great to hear as I had thought to do this, but wasn?t certain of its legitimacy, so extra support in this direction gives me confidence. However, as I am actually running a number of analyses this approach only works when AIC identifies a clear single best model; in some of these analyses AIC finds good evidence for multiple models in the top model set, such that I use model averaging to produce averaged parameter estimates. Obviously with model averaged parameters, it is not possible to run an alternate non-standardised model. So I need to be able to take a model averaged slope parameter estimate from the averaged model summary and back-transform it to a real slope value.
Oh well, that makes things much easier - you shouldn't use model averaged parameters. They're close to meaningless. :-) (see here, for example: http://onlinelibrary.wiley.com/doi/10.1890/14-1639.1/full). In your case, I don't think you need to worry at all about model selection: you are actually interested in the interaction, so you need it in the model. Beyond that, just follow Phil Dixon's advice. Bob The provided equation is useful:
z.time = (time ? mean(time))/2*sd(time)
However, I?m still a little uncertain how best I should employ it, or how
I might relate it to a model averaged slope estimate.
Here is a brief worked example of my confusion, (data at the bottom of
the page):
My analysis
I would like to test if treatment (kept in air or ice) affects nitrogen
(N) within shrimps over time. I have repeated measures per shrimp (
unique.id) that I use as a random factor to account for non-independence
within an individual.
My global model is a linear mixed model:
model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp, REML=FALSE)
Standardisation:
So presumably, using the above equation I can standardise my ?time?
variable to create a new standardised time variable for use in the analysis.
?time? is a continuous variable but with 5 different values (2, 30.5,
58.5, 93, 120 hr) where the mean = 60.8 and SD = 43.34 (time data below).
So for each value of time:
new standardized time value = (old time value ? 60.8)/(2*43.34).
This produces a new standardised time variable (also below).
I?m not sure how air.ice (2 level factor air or ice) is being
standardised when I use the deafult R code with package "arm":
stdz.model1<-standardize(model1, standardize.y=FALSE).
I believe the default, which I use, leaves this binomial factor unscaled
(as a binomial is already scaled between 0 and 1?).
Even if this manual standardisation for time is correct, then I'm still
unsure how I would back-transform my slope value. Slopes are supposed to be
the value of increase in y per unit of x, so I assume standardised slope
estimates represent change in y per standardised 1 unit on x axis.
So with all this uncertainty in mind:
AIC model selection using standardised models identifies two top models;
the interaction model and the main effects model. I model average across
these two standardised models.
I present the model summaries for these at the bottom of the page, and
that of the model averaged model.
Strangely, and which I don?t understand, unstandardised model1 and
unstandardised model2 slopes differ, while both stdz.model 1 and
stdz.model2 have the same slope values. The model averaged summary (based
on average of stdz model1 and stdz mode2) has a slope value the same as the
stdz models.
For simplicity of understanding my problem, I focus on model 1, rather
than the model-averaged model.
Here are the slopes for different versions of Model 1:
Model 1 unstandardised (0.008156)
model 1 standardised (0.50782)
model 1 with my manually standardised time variable (0.3979)
If I take the slope value from model 1 unstandardised (0.008156) I can
obviously accurately predict the value of dN for any given time value. This
is not the case for the bigger values in the two standardised models.
Clearly, as the slope of the model using my manually standardised time
variable is different from the standardised model, my use of the above
equation is not correct. Or my use is correct, but I am missing something
else in the standardisation process.
Secondly, I am still unsure how I would apply the above equation to
back-transform the standardised slope value (0.50782) so that I may get a
value equivalent to my unstandardised ?real? slope estimate (0.008156).
Any suggestions for how I can back-transform a slope estimate from my
model summary, or properly apply the equation above to standardise my
model, would be most appreciated.
thanks,
Matt
### Models, data and summary tables below
model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp, REML=FALSE)
model2<-lmer( N ~ time + air.ice + (1|unique.id), data=shrimp,
REML=FALSE)
model3<-lmer( N ~ std.Time*air.ice + (1|unique.id), data=shrimp,
REML=FALSE)
stdz.model1<-standardize(model1, standardize.y=FALSE)
stdz.model2<-standardize(model2, standardize.y=FALSE)
Data (where manually standardised time = std.Time).
std.Time = (time - mean of time) / (sd of time*2)
time air.ice unique.id dN std.Time
2 air 37 12.41 -0.68
2 air 38 11.97 -0.68
2 air 39 12.00 -0.68
2 air 40 12.71 -0.68
30.5 air 37 12.96 -0.35
30.5 air 38 12.88 -0.35
30.5 air 39 12.56 -0.35
30.5 air 40 13.86 -0.35
58.5 air 37 13.11 -0.03
58.5 air 38 12.71 -0.03
58.5 air 39 12.63 -0.03
58.5 air 40 13.75 -0.03
93 air 37 13.18 0.37
93 air 38 13.15 0.37
93 air 39 12.84 0.37
93 air 40 14.29 0.37
120 air 37 13.24 0.68
120 air 38 13.12 0.68
120 air 39 12.75 0.68
120 air 40 14.26 0.68
2 ice 33 11.71 -0.68
2 ice 34 11.23 -0.68
2 ice 35 11.85 -0.68
2 ice 36 11.55 -0.68
30.5 ice 33 11.86 -0.35
30.5 ice 34 11.76 -0.35
30.5 ice 35 12.51 -0.35
30.5 ice 36 11.94 -0.35
58.5 ice 33 12.02 -0.03
58.5 ice 34 11.63 -0.03
58.5 ice 35 12.22 -0.03
58.5 ice 36 11.99 -0.03
93 ice 33 11.97 0.37
93 ice 34 11.80 0.37
93 ice 35 12.45 0.37
93 ice 36 11.89 0.37
120 ice 33 12.08 0.68
120 ice 34 11.68 0.68
120 ice 35 12.55 0.68
120 ice 36 12.25 0.68
model1 ? interaction (global) model, unstandardised
Fixed effects:
Estimate Std.
Error t value
(Intercept) 12.522535 0.197024 63.56
time 0.008156 0.001134
7.19
air.iceice -0.801936 0.278634
-2.88
time:air.iceice -0.004442 0.001604 -2.77
stdz.model1 ? interaction (global) model, standardised
Fixed effects:
Estimate Std.
Error t value
(Intercept) 12.48242 0.13051 95.65
z.time 0.50782 0.06861 7.40
c.air.ice -1.07202 0.26102 -4.11
z.time:c.air.ice -0.38008 0.13722 -2.77
model3 ? interaction (global) model, with manually standardised ?time?
variable
Fixed effects:
Estimate Std.
Error t value
(Intercept) 13.0184 0.1846 70.54
std.Time 0.3979 0.1469 2.71
air.iceice -1.0720 0.2610 -4.11
std.Time:air.iceice -0.1460 0.2078 -0.70
stdz.model2 ? main effects model, unstandardised
Fixed effects:
Estimate Std.
Error t value
(Intercept) 12.6575787 0.1923837 65.79
time 0.0059351 0.0008929 6.65
air.iceice -1.0720225 0.2610155 -4.11
model2 ? main effects model, standardised
Fixed effects:
Estimate Std. Error t value
(Intercept) 12.48242 0.13051 95.65
z.time 0.50782 0.07639 6.65
c.air.ice -1.07202 0.26102 -4.11
Model-averaged coefficients:
Estimate Std.
Error z value Pr(>|z|)
(Intercept) 12.48242 0.13051
95.645 < 2e-16 ***
c.air.ice -1.07202 0.26102
4.107 4.01e-05 ***
z.time 0.50782 0.06954
7.302 < 2e-16 ***
c.air.ice:z.time -0.38008 0.13722
2.770 0.00561 **
________________________________________
From: Drew Tyre <atyre2 at unl.edu>
Sent: 11 January 2016 23:12
To: Matt Perkins; r-sig-ecology at r-project.org
Subject: RE: [R-sig-eco] AIC in R: back-transforming standardized model
parameters (slopes)
Hi Matt,
This isn't going to be a complete answer, but it might help.
I wasn't 100% sure what standardize() was doing, or how it was doing it,
so I did
getMethod("standardize","glm")
to see the source code. That function calls standardize.default() which
is a bit hard to get but
getFromNamespace("standardize.default","arm")
pulls it out. From that code you can see that standardize() extracts the
data from the model object, centers and scales it, and then refits the
model to the centered and scaled data. So the formula you're looking for is
z.time = (time - mean(time))/2*sd(time)
similar to a standard Z transformation but using 2 times the sd in the
denominator.
Therefore I wished to know (preferably) the calculation being made, and
more
importantly the function/code to back-transform my slope estimates to
reportable 'real' slopes.
Hmmm, but the main reason to do the transformation in the first place is
to make it easier to do comparisons between the effect sizes of different
variables. If you want to report "real" slopes, just use the ones from your
model1, which should be near identical to the backtransformed versions of
the ones from model2.
Another reason for centering and scaling is to improve numerical
stability of your estimates, but if you were able to fit model1 and it
didn't complain, not sure that you need to bother with the standardization.
There are other reasons too, but I don't think any of them apply here.
There's a great discussion of when to scale and why here:
http://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia
--
Drew Tyre
School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974
phone: +1 402 472 4054
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre
http://aminpractice.blogspot.com
http://www.flickr.com/photos/atiretoo
-----Original Message-----
From: R-sig-ecology [mailto:r-sig-ecology-bounces at r-project.org] On
Behalf Of
Matt Perkins
Sent: Monday, January 11, 2016 1:40 AM
To: r-sig-ecology at r-project.org
Subject: [R-sig-eco] AIC in R: back-transforming standardized model
parameters
(slopes)
Hi All,
I have a simple Q that I'm having some difficulty finding an answer for.
I'm
conducting AIC analyses in R, and I would like to be able to report
'real' slope
values from my model summary output (i.e. take the slopes and report them
within simple y=mx+c linear equations in my paper). However, following
Greuber
etal 2011, I have standardised the explanatory variables in my model by
centering them to a mean of zero and and an SD of 2, using the following
code
and the R package "arm". My model has a normal error distribution.
stdz.model1<-standardize(model1, standardize.y=FALSE)
I do not yet know the sums behind this code in order to know how and what
change has been made to my explanatory variables, in order that I could
manually make back-transformations.
Therefore I wished to know (preferably) the calculation being made, and
more
importantly the function/code to back-transform my slope estimates to
reportable 'real' slopes.
Addtionally, is it correct (or does it even matter) that I should be
focusing my
back-transformation on the slope estimate taken from the model summary,
as
opposed to instead using the model summary standardised slope estimate to
calculate a y value in my linear equation (y=mx+c), and then
back-transforming
that y value?
##
If it is useful, my model and summary tables are below.
I would like to test if treatment (kept in air or ice) affects nitrogen
(N) within
shrimps over time. I have repeated measures per shrimp (unique.id) that
I use as
a random factor to account for non-independence within an individual.
My model is a linear mixed model of the form:
model1<-lmer( N ~ time* air.ice + (1|unique.id), data=shrimp,
REML=FALSE)
The two model summary tables below show the 1) un-standardised model with
ready-to-use slope value ("time" = 0.008156)
and 2) standardised model with much-larger slope value ("z.time" =
0.50782)
1)
Fixed effects:
Estimate Std.
Error t value
(Intercept) 12.522535 0.197024 63.56
time 0.008156 0.001134
7.19
air.ice -0.801936 0.278634
-2.88
time:air.ice -0.004442 0.001604 -2.77
2)
Estimate Std.
Error t value
(Intercept) 12.48242 0.13051
95.65
z.time 0.50782 0.06861
7.40
c.air.ice -1.07202 0.26102
-4.11
z.time:c.air.ice -0.38008 0.13722
-2.77
[[alternative HTML version deleted]]
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology