physical constraint with gam
Thanks for the clarification!
On Sat, May 14, 2016 at 1:24 AM, Simon Wood <simon.wood at bath.edu> wrote:
On 12/05/16 02:29, Dominik Schneider wrote: Hi again, I'm looking for some clarification on 2 things. 1. On that last note, I realize that s(x1,x2) would be the other obvious interaction to compare with - and I see that you recommend te(x1,x2) if they are not on the same scale. - yes that's right, s(x1,x2) gives an isotropic smooth, which is usually only appropriate if x1 and x2 are naturally on the same scale. 2. If s(x1,by=x1) gives you a "parameter" value similar to a GLM when you plot s(x1):x1, why does my function above return the same yhat as predict(mdl,type='response') ? Shouldn't each of the terms need to be multiplied by the variable value before applying rowSums()+attr(sterms,'constant') ?? predict returns s(x1)*x1 (plot.gam just plots s(x1), because in general s(x1,by=x2) is not smooth). If you want to get s(x1) on its own you need to do something like this: x2 <- x1 ## copy x1 m <- gam(y~s(x1,by=x2)) ## model implementing s(x1,by=x1) using copy of x1 predict(m,data.frame(x1=x1,x2=rep(1,length(x2))),type="terms") ## now predicted s(x1)*x2 = s(x1) best, Simon Thanks again Dominik On Wed, May 11, 2016 at 10:11 AM, Dominik Schneider < <Dominik.Schneider at colorado.edu>Dominik.Schneider at colorado.edu> wrote:
Hi Simon, Thanks for this explanation. To make sure I understand, another way of explaining the y axis in my original example is that it is the contribution to snowdepth relative to the other variables (the example only had fsca, but my actual case has a couple others). i.e. a negative s(fsca) of -0.5 simply means snowdepth 0.5 units below the intercept+s(x_i), where s(x_i) could also be negative in the case where total snowdepth is less than the intercept value. The use of by=fsca is really useful for interpreting the marginal impact of the different variables. With my actual data, the term s(fsca):fsca is never negative, which is much more intuitive. Is it appropriate to compare magnitudes of e.g. s(x2):x2 / mean(x2) and s(x2):x2 / mean(x2) where mean(x_i) are the mean of the actual data? Lastly, how would these two differ: s(x1,by=x2); or s(x1,by=x1)*s(x2,by=x2) since interactions are surely present and i'm not sure if a linear combination is enough. Thanks! Dominik On Wed, May 11, 2016 at 3:11 AM, Simon Wood < <simon.wood at bath.edu> simon.wood at bath.edu> wrote:
The spline having a positive value is not the same as a glm coefficient having a positive value. When you plot a smooth, say s(x), that is equivalent to plotting the line 'beta * x' in a GLM. It is not equivalent to plotting 'beta'. The smooths in a gam are (usually) subject to `sum-to-zero' identifiability constraints to avoid confounding via the intercept, so they are bound to be negative over some part of the covariate range. For example, if I have a model y ~ s(x) + s(z), I can't estimate the mean level for s(x) and the mean level for s(z) as they are completely confounded, and confounded with the model intercept term. I suppose that if you want to interpret the smooths as glm parameters varying with the covariate they relate to then you can do, by setting the model up as a varying coefficient model, using the `by' argument to 's'... gam(snowdepth~s(fsca,by=fsca),data=dat) this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' . s(fsca,by=fsca) is not confounded with the intercept, so no constraint is needed or applied, and you can now interpret the smooth like a local GLM coefficient. best, Simon On 11/05/16 01:30, Dominik Schneider wrote:
Hi,
Just getting into using GAM using the mgcv package. I've generated some
models and extracted the splines for each of the variables and started
visualizing them. I'm noticing that one of my variables is physically
unrealistic.
In the example below, my interpretation of the following plot is that
the
y-axis is basically the equivalent of a "parameter" value of a GLM; in
GAM
this value can change as the functional relationship changes between x
and
y. In my case, I am predicting snowdepth based on the fractional snow
covered area. In no case will snowdepth realistically decrease for a
unit
increase in fsca so my question is: *Is there a way to constrain the
spline
to positive values? *
Thanks
Dominik
library(mgcv)
library(dplyr)
library(ggplot2)
extract_splines=function(mdl){
sterms=predict(mdl,type='terms')
datplot=cbind(sterms,mdl$model) %>% tbl_df
datplot$intercept=attr(sterms,'constant')
datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
return(datplot)
}
dat=data_frame(snowdepth=runif(100,min =
0.001,max=6.7),fsca=runif(100,0.01,.99))
mdl=gam(snowdepth~s(fsca),data=dat)
termdF=extract_splines(mdl)
ggplot(termdF)+
geom_line(aes(x=fsca,y=`s(fsca)`))
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide <http://www.R-project.org/posting-guide.html> http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK +44 (0)117 33 18273 <%2B44%20%280%29117%2033%2018273> <http://www.maths.bris.ac.uk/%7Esw15190> http://www.maths.bris.ac.uk/~sw15190
-- Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK +44 (0)117 33 18273 http://www.maths.bris.ac.uk/~sw15190