Hello,
I've read several time the term "shrinkage", either on this list or,
even more often, when dealing with population pharmacokinetics, and I
am not quite sure what it means and what is its usage... Could it be
possible to have either some references or some explanations? I give
below a longer version, with how far I could get and where I am
stopped... Thanks in advance for any help!
I've search a little bit on the net; shrinkage seems related to the
fact that after regression, it is possible to obtain more precise, but
slightly biased, estimators of the coefficients, by making them a
little bit smaller than the actual value (hence ? shrinkage ?).
However, in the discussions especially about PK-pop models, the usage
of "shrinkage" does not seem to me coherent with this meaning...
Instead, it seems to be a property of mixed-models, linked to
variances estimations, and used to check the model quality or validaty
in some way, with sentences like "this model increased the shrinkage"
and mentions of something like "random effects parameters shrinkage"
and "residuals shrinkage" (eta-shrinkage and epsilon-shrinkage)...
My other idea was related to the fact that when modeling a set of
repeated measures on several patients, with a straight line, the set
of slopes shows less variability when using a mixed model on the whole
set, than using separate lines for each patient --- as exemplified for
instance in Douglas Bate's book. Hence, variance of slopes is shrinked
in the mixed model approach compared to the variance obtained from the
sample of all individual slopes. This idea seems closer to the use and
terminology above, but I can't see if shrinkage is a good or bad
thing...
I mean, since one imposes a given distribution, hence a constraint, on
slopes, the fact that variance is smaller is not a surprise and it
could be a drawback of the estimation, leading to underestimation.
Conversely, variance of individual slopes also includes the residual
variability, hence is expected to be higher. Is it true then that the
mixed-model estimation is better? But in that case, how shrinkage can
be used to quantify the correctness of a model?
Thanks in advance,
Best regards,
I like the baseball example in here (and the paper too), but if you don't know baseball, similar examples could be thought up in other sports.
http://www-stat.stanford.edu/~ckirby/brad/other/Article1977.pdf
Daniel B. Wright, Ph.D.
Senior Research Associate - Learning Insights Team
500 ACT Drive, Iowa City, IA 52243-0168
512.320.1827
This email message is intended only for the personal use of the recipient(s) named above. If you are not an intended recipient, you may not review, copy, or distribute this message. If you have received this communication in error, please notify the sender immediately by email and delete the original?message.
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel Curis
Sent: Wednesday, September 18, 2013 3:20 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Question about what is "shrinkage"...
Hello,
I've read several time the term "shrinkage", either on this list or, even more often, when dealing with population pharmacokinetics, and I am not quite sure what it means and what is its usage... Could it be possible to have either some references or some explanations? I give below a longer version, with how far I could get and where I am stopped... Thanks in advance for any help!
I've search a little bit on the net; shrinkage seems related to the fact that after regression, it is possible to obtain more precise, but slightly biased, estimators of the coefficients, by making them a little bit smaller than the actual value (hence ? shrinkage ?).
However, in the discussions especially about PK-pop models, the usage of "shrinkage" does not seem to me coherent with this meaning...
Instead, it seems to be a property of mixed-models, linked to variances estimations, and used to check the model quality or validaty in some way, with sentences like "this model increased the shrinkage"
and mentions of something like "random effects parameters shrinkage"
and "residuals shrinkage" (eta-shrinkage and epsilon-shrinkage)...
My other idea was related to the fact that when modeling a set of repeated measures on several patients, with a straight line, the set of slopes shows less variability when using a mixed model on the whole set, than using separate lines for each patient --- as exemplified for instance in Douglas Bate's book. Hence, variance of slopes is shrinked in the mixed model approach compared to the variance obtained from the sample of all individual slopes. This idea seems closer to the use and terminology above, but I can't see if shrinkage is a good or bad thing...
I mean, since one imposes a given distribution, hence a constraint, on slopes, the fact that variance is smaller is not a surprise and it could be a drawback of the estimation, leading to underestimation.
Conversely, variance of individual slopes also includes the residual variability, hence is expected to be higher. Is it true then that the mixed-model estimation is better? But in that case, how shrinkage can be used to quantify the correctness of a model?
Thanks in advance,
Best regards,
Emmanuel,
A very simple example of shrinkage: data take the form
Y = m + U + e
for each individual. There are two error terms, U and e. We may want to
predict m + U, and if we know the ratio of variances of U and e, we can
add the appropriate fraction (<1) of the residual to m. For example, Y
might be a student's test score, U is a measure of his innate ability,
and e reflects temporary effects such as whether he had a cold on the
day of the exam, or got lucky in his choice of revision topics.
On 09/18/2013 09:20 PM, Emmanuel Curis wrote:
Hello,
I've read several time the term "shrinkage", either on this list or,
even more often, when dealing with population pharmacokinetics, and I
am not quite sure what it means and what is its usage... Could it be
possible to have either some references or some explanations? I give
below a longer version, with how far I could get and where I am
stopped... Thanks in advance for any help!
I've search a little bit on the net; shrinkage seems related to the
fact that after regression, it is possible to obtain more precise, but
slightly biased, estimators of the coefficients, by making them a
little bit smaller than the actual value (hence ? shrinkage ?).
However, in the discussions especially about PK-pop models, the usage
of "shrinkage" does not seem to me coherent with this meaning...
Instead, it seems to be a property of mixed-models, linked to
variances estimations, and used to check the model quality or validaty
in some way, with sentences like "this model increased the shrinkage"
and mentions of something like "random effects parameters shrinkage"
and "residuals shrinkage" (eta-shrinkage and epsilon-shrinkage)...
My other idea was related to the fact that when modeling a set of
repeated measures on several patients, with a straight line, the set
of slopes shows less variability when using a mixed model on the whole
set, than using separate lines for each patient --- as exemplified for
instance in Douglas Bate's book. Hence, variance of slopes is shrinked
in the mixed model approach compared to the variance obtained from the
sample of all individual slopes. This idea seems closer to the use and
terminology above, but I can't see if shrinkage is a good or bad
thing...
I mean, since one imposes a given distribution, hence a constraint, on
slopes, the fact that variance is smaller is not a surprise and it
could be a drawback of the estimation, leading to underestimation.
Conversely, variance of individual slopes also includes the residual
variability, hence is expected to be higher. Is it true then that the
mixed-model estimation is better? But in that case, how shrinkage can
be used to quantify the correctness of a model?
Thanks in advance,
Best regards,
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Hello,
Thanks Daniel for the link on the clear article (despite I indeed do
not know anything about baseball) and Douglas for the detailed
answer. Quite interestingly, the article is more on the side of the
estimator and Douglas' answer on the side of the reduced variance, at
least as I understand it, but I think I begin to understand the link
between the two.
But there are still a few questions I have, some of them
philosophical...
When reading the paper, the two examples correspond to setups that
could be handled by random-effect models (the baseball player or the
town). In fact, in the end of the paper, individual mean values coming
from a random variable is mentionned.
Does it mean that individual means obtained by random effect models as
used in lmer, for instance, are themselves a kind of shrinkage
estimator --- that is, already corrected by a shrinkage factor, but
not given by a formula similar to the one cited in the paper? I know
that random effects themselves are not (conditionnal) means, but
modes, but when added to the fixed effects parts, corresponding to the
mean (at least in linear models), aren't they comparable to (shrinked)
means?
Would it be, in this case, an argument for prefering random effects
over fixed effects when the number of modalities is ? high ? (>= 3 if
I read correctly the paper, but may be another limit for such models
and for cases of unkwnown, estimated variance?), beside convergence
problems, and instead prefer fixed effects below even if
philosophically a random effect would be needed (experiments on two
patients only) --- and that there is a link between the efficiency of
the shrinkage effect and the ability to estimate correctly the
variance?
This would also explain how it is possible to associate a shrinkage to
each random effect...
As far as I could see, however, the shrinkage estimator can also
improve regression coefficients, when they are more than 3. Does it
still holds when dealing with multidimensionnal vectors of which each
composent represent very different things ? And for regression
coefficients, if shrinked version gives better values, wouldn't it be
logical to build tests on these coefficients on the shrinked values?
Is it possible? (but these questions are on the frontier to be
off-topic I guess).
My other concern is about the usage of shrinkage as a diagnostic. If I
understood correctly Douglas answer, size of shrinkage measures how
informative is the data of a single patient to estimate its own
value. Hence, if shrinkage is important, does it mean that the model
is not suitable for looking into individual predictions, but only
average ones --- hence, useless in PK-pop for adaptating doses for
instance? Is there any guides to define what is an acceptable
shrinkage? And does it have other values for model's diagnostic and
interpretation?
Last point: I understand well in the paper how to calculate the
shrinkage factor (there seems to be several different but close
formulas according to the reference, but I guess these are only
variants?), using obtained values for each individual. But for several
linear models, as mentionned by Douglas, it is not possible to obtain
individual parameters. In such case, how is shrinkage
computed/estimated ?
Thanks again in advance for any answer,
Hi Emmanuel and others,
I would think of shrinkage as a characteristic result of what a lot of estimation methods do, rather than as a method in itself. The Efron and Morris paper focused on showing that this characteristic can be good, and they discuss this in light of some popular methods used then (they wrote several influential and more technical papers on empirical Bayes during this period). A lot of mixed/multilevel folks discuss different methods, but the value of this characteristic is well illustrated in Bates' last email and the "borrowing strength" phrase. So, yes, the values in caterpillar plots and the like (conditional modes, though often called level 2 residuals) are estimates of the individual unit which borrow information from the other units, so are "shrunken".
One way to show the effect of "shrinking" is show a plot with all the individual regression lines (just two variables) and then show the lines with with the slope and intercept estimated with shrinkage. An example is comparing Figures 3.7 and 3.8 of Kreft and de Leeuw's Introducing Multilevel Modeling. Here is some code to make an example. The plot on the left shows the OLS estimates, and there are two level 2 units which are very different from the others. The shrunken estimates on the right borrow information from the other 8 and mean the slopes of these two, while still different from the others, are a little less different.
set.seed(818)
library(lme4)
lev2 <- rep(1:10,10)
x <- rnorm(100)
y <- rep(rnorm(10),10) + (rep(rnorm(10),10)+2)*x + rnorm(100)
par(mfrow=c(1,2))
plot(x,y,cex=.5)
for (i in 1:10)
abline(lm(y[lev2==i]~x[lev2==i]))
plot(x,y,cex=.5)
m1 <- coef(lmer(y~x+(x|lev2)))$lev2
for (i in 1:10)
abline(m1[i,1],m1[i,2])
Dan
-----Original Message-----
From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr]
Sent: Thursday, September 19, 2013 10:41 AM
To: Daniel Wright; bates at stat.wisc.edu
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Question about what is "shrinkage"...
Hello,
Thanks Daniel for the link on the clear article (despite I indeed do not know anything about baseball) and Douglas for the detailed answer. Quite interestingly, the article is more on the side of the estimator and Douglas' answer on the side of the reduced variance, at least as I understand it, but I think I begin to understand the link between the two.
But there are still a few questions I have, some of them philosophical...
When reading the paper, the two examples correspond to setups that could be handled by random-effect models (the baseball player or the town). In fact, in the end of the paper, individual mean values coming from a random variable is mentionned.
Does it mean that individual means obtained by random effect models as used in lmer, for instance, are themselves a kind of shrinkage estimator --- that is, already corrected by a shrinkage factor, but not given by a formula similar to the one cited in the paper? I know that random effects themselves are not (conditionnal) means, but modes, but when added to the fixed effects parts, corresponding to the mean (at least in linear models), aren't they comparable to (shrinked) means?
Would it be, in this case, an argument for prefering random effects over fixed effects when the number of modalities is < high > (>= 3 if I read correctly the paper, but may be another limit for such models and for cases of unkwnown, estimated variance?), beside convergence problems, and instead prefer fixed effects below even if philosophically a random effect would be needed (experiments on two patients only) --- and that there is a link between the efficiency of the shrinkage effect and the ability to estimate correctly the variance?
This would also explain how it is possible to associate a shrinkage to each random effect...
As far as I could see, however, the shrinkage estimator can also improve regression coefficients, when they are more than 3. Does it still holds when dealing with multidimensionnal vectors of which each composent represent very different things ? And for regression coefficients, if shrinked version gives better values, wouldn't it be logical to build tests on these coefficients on the shrinked values?
Is it possible? (but these questions are on the frontier to be off-topic I guess).
My other concern is about the usage of shrinkage as a diagnostic. If I understood correctly Douglas answer, size of shrinkage measures how informative is the data of a single patient to estimate its own value. Hence, if shrinkage is important, does it mean that the model is not suitable for looking into individual predictions, but only average ones --- hence, useless in PK-pop for adaptating doses for instance? Is there any guides to define what is an acceptable shrinkage? And does it have other values for model's diagnostic and interpretation?
Last point: I understand well in the paper how to calculate the shrinkage factor (there seems to be several different but close formulas according to the reference, but I guess these are only variants?), using obtained values for each individual. But for several linear models, as mentionned by Douglas, it is not possible to obtain individual parameters. In such case, how is shrinkage computed/estimated ?
Thanks again in advance for any answer,