physical constraint with gam
On 12/05/16 02:29, Dominik Schneider wrote:
Hi again, I'm looking for some clarification on 2 things. 1. On that last note, I realize that s(x1,x2) would be the other obvious interaction to compare with - and I see that you recommend te(x1,x2) if they are not on the same scale.
- yes that's right, s(x1,x2) gives an isotropic smooth, which is usually only appropriate if x1 and x2 are naturally on the same scale.
2. If s(x1,by=x1) gives you a "parameter" value similar to a GLM when you plot s(x1):x1, why does my function above return the same yhat as predict(mdl,type='response') ? Shouldn't each of the terms need to be multiplied by the variable value before applying rowSums()+attr(sterms,'constant') ??
predict returns s(x1)*x1 (plot.gam just plots s(x1), because in general s(x1,by=x2) is not smooth). If you want to get s(x1) on its own you need to do something like this: x2 <- x1 ## copy x1 m <- gam(y~s(x1,by=x2)) ## model implementing s(x1,by=x1) using copy of x1 predict(m,data.frame(x1=x1,x2=rep(1,length(x2))),type="terms") ## now predicted s(x1)*x2 = s(x1) best, Simon
Thanks again
Dominik
On Wed, May 11, 2016 at 10:11 AM, Dominik Schneider
<Dominik.Schneider at colorado.edu
<mailto:Dominik.Schneider at colorado.edu>> wrote:
Hi Simon, Thanks for this explanation.
To make sure I understand, another way of explaining the y axis in
my original example is that it is the contribution to snowdepth
relative to the other variables (the example only had fsca, but my
actual case has a couple others). i.e. a negative s(fsca) of -0.5
simply means snowdepth 0.5 units below the intercept+s(x_i), where
s(x_i) could also be negative in the case where total snowdepth is
less than the intercept value.
The use of by=fsca is really useful for interpreting the marginal
impact of the different variables. With my actual data, the term
s(fsca):fsca is never negative, which is much more intuitive. Is
it appropriate to compare magnitudes of e.g. s(x2):x2 / mean(x2)
and s(x2):x2 / mean(x2) where mean(x_i) are the mean of the
actual data?
Lastly, how would these two differ: s(x1,by=x2); or
s(x1,by=x1)*s(x2,by=x2) since interactions are surely present and
i'm not sure if a linear combination is enough.
Thanks!
Dominik
On Wed, May 11, 2016 at 3:11 AM, Simon Wood <simon.wood at bath.edu
<mailto:simon.wood at bath.edu>> wrote:
The spline having a positive value is not the same as a glm
coefficient having a positive value. When you plot a smooth,
say s(x), that is equivalent to plotting the line 'beta * x'
in a GLM. It is not equivalent to plotting 'beta'. The smooths
in a gam are (usually) subject to `sum-to-zero'
identifiability constraints to avoid confounding via the
intercept, so they are bound to be negative over some part of
the covariate range. For example, if I have a model y ~ s(x) +
s(z), I can't estimate the mean level for s(x) and the mean
level for s(z) as they are completely confounded, and
confounded with the model intercept term.
I suppose that if you want to interpret the smooths as glm
parameters varying with the covariate they relate to then you
can do, by setting the model up as a varying coefficient
model, using the `by' argument to 's'...
gam(snowdepth~s(fsca,by=fsca),data=dat)
this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' .
s(fsca,by=fsca) is not confounded with the intercept, so no
constraint is needed or applied, and you can now interpret the
smooth like a local GLM coefficient.
best,
Simon
On 11/05/16 01:30, Dominik Schneider wrote:
Hi,
Just getting into using GAM using the mgcv package. I've
generated some
models and extracted the splines for each of the variables
and started
visualizing them. I'm noticing that one of my variables is
physically
unrealistic.
In the example below, my interpretation of the following
plot is that the
y-axis is basically the equivalent of a "parameter" value
of a GLM; in GAM
this value can change as the functional relationship
changes between x and
y. In my case, I am predicting snowdepth based on the
fractional snow
covered area. In no case will snowdepth realistically
decrease for a unit
increase in fsca so my question is: *Is there a way to
constrain the spline
to positive values? *
Thanks
Dominik
library(mgcv)
library(dplyr)
library(ggplot2)
extract_splines=function(mdl){
sterms=predict(mdl,type='terms')
datplot=cbind(sterms,mdl$model) %>% tbl_df
datplot$intercept=attr(sterms,'constant')
datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
return(datplot)
}
dat=data_frame(snowdepth=runif(100,min =
0.001,max=6.7),fsca=runif(100,0.01,.99))
mdl=gam(snowdepth~s(fsca),data=dat)
termdF=extract_splines(mdl)
ggplot(termdF)+
geom_line(aes(x=fsca,y=`s(fsca)`))
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing
list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.
--
Simon Wood, School of Mathematics, University of Bristol BS8
1TW UK
+44 (0)117 33 18273 <tel:%2B44%20%280%29117%2033%2018273>
http://www.maths.bris.ac.uk/~sw15190
<http://www.maths.bris.ac.uk/%7Esw15190>
Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK +44 (0)117 33 18273 http://www.maths.bris.ac.uk/~sw15190 [[alternative HTML version deleted]]