It seems that in general
gam(y~lo(x)) # gam() from the gam package.
and
loess(y~x)
give slightly different results (in respect of the predicted/fitted
values).
Most noticeable at the endpoints of the range of x.
Can anyone enlighten me about the reason for this difference?
Is it possible to twiddle the control parameters, for either or both
functions,
so as to obtain identical results?
Thanks.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
Difference between gam() and loess().
6 messages · Rolf Turner, Ravi Varadhan, Kevin E. Thorpe
Rolf Turner wrote:
It seems that in general
gam(y~lo(x)) # gam() from the gam package.
and
loess(y~x)
give slightly different results (in respect of the predicted/fitted
values).
Most noticeable at the endpoints of the range of x.
Can anyone enlighten me about the reason for this difference?
Is it possible to twiddle the control parameters, for either or both
functions,
so as to obtain identical results?
There are two obvious differences in the defaults. In lo() from the gam package, span=0.5 and degree=1 while for loess(), span=0.75 and degree=2. Try gam(y~lo(x,span=0.75,degree=2)) and see if that helps. Kevin
Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
Good try, Kevin. But that doesn't seem to do it. set.seed(123) x <- sort(runif(100)) y <- sin(4*pi*x) + rnorm(100, sd=0.2) ans.lo2 <- loess(y ~ x, degree=2, span=0.75) ans.gam2 <- gam(y ~ lo(x, degree=2, span=0.75)) summary(ans.lo2$fitted - ans.gam2$fitted) # larger differences, about 10% ans.lo1 <- loess(y ~ x, degree=1, span=0.75) ans.gam1 <- gam(y ~ lo(x, degree=1, span=0.75)) summary(ans.lo1$fitted - ans.gam1$fitted) # smaller differences, about 2-5 percent I also tried a number of other things including changing the "family", and parameters in "loess.control", but to no avail. I looked at the Fortran codes from both loess and gam. They are daunting, to say the least. They are dense, and there are absolutely no comments whatsoever. But one thing is clear - they are using different Fortran codes. So, the best bet might be to get Trevor Hastie or Bill Cleveland to help you out. But, before that: why is this an issue, Rolf? Is it important that these two results be identical? Best, Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvaradhan at jhmi.edu ----- Original Message ----- From: "Kevin E. Thorpe" <kevin.thorpe at utoronto.ca> Date: Thursday, March 19, 2009 8:23 pm Subject: Re: [R] Difference between gam() and loess(). To: Rolf Turner <r.turner at auckland.ac.nz> Cc: R-help Forum <r-help at r-project.org>
Rolf Turner wrote:
> > It seems that in general > > gam(y~lo(x)) # gam() from the gam package. > > and > loess(y~x) > > give slightly different results (in respect of the predicted/fitted
> values). > Most noticeable at the endpoints of the range of x. > > Can anyone enlighten me about the reason for this difference? > > Is it possible to twiddle the control parameters, for either or
both
> functions, > so as to obtain identical results?
There are two obvious differences in the defaults. In lo() from the gam package, span=0.5 and degree=1 while for loess(), span=0.75 and degree=2. Try gam(y~lo(x,span=0.75,degree=2)) and see if that helps. Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
______________________________________________ R-help at r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.
Ravi Varadhan wrote:
Good try, Kevin. But that doesn't seem to do it. set.seed(123) x <- sort(runif(100)) y <- sin(4*pi*x) + rnorm(100, sd=0.2) ans.lo2 <- loess(y ~ x, degree=2, span=0.75) ans.gam2 <- gam(y ~ lo(x, degree=2, span=0.75)) summary(ans.lo2$fitted - ans.gam2$fitted) # larger differences, about 10% ans.lo1 <- loess(y ~ x, degree=1, span=0.75) ans.gam1 <- gam(y ~ lo(x, degree=1, span=0.75)) summary(ans.lo1$fitted - ans.gam1$fitted) # smaller differences, about 2-5 percent I also tried a number of other things including changing the "family", and parameters in "loess.control", but to no avail. I looked at the Fortran codes from both loess and gam. They are daunting, to say the least. They are dense, and there are absolutely no comments whatsoever. But one thing is clear - they are using different Fortran codes. So, the best bet might be to get Trevor Hastie or Bill Cleveland to help you out. But, before that: why is this an issue, Rolf? Is it important that these two results be identical? Best, Ravi.
There was one other thing I found that I shared with Rolf off-list. In loess.control() there is an iterations argument which is related to the robustness of the estimates. I would think that could also account for tail departures especially. I don't gave the gam package installed, so can't test these myself at the moment.
----- Original Message ----- From: "Kevin E. Thorpe" <kevin.thorpe at utoronto.ca> Date: Thursday, March 19, 2009 8:23 pm Subject: Re: [R] Difference between gam() and loess(). To: Rolf Turner <r.turner at auckland.ac.nz> Cc: R-help Forum <r-help at r-project.org>
Rolf Turner wrote:
> > It seems that in general > > gam(y~lo(x)) # gam() from the gam package. > > and > loess(y~x) > > give slightly different results (in respect of the predicted/fitted
> values). > Most noticeable at the endpoints of the range of x. > > Can anyone enlighten me about the reason for this difference? > > Is it possible to twiddle the control parameters, for either or
both
> functions, > so as to obtain identical results?
There are two obvious differences in the defaults. In lo() from the gam package, span=0.5 and degree=1 while for loess(), span=0.75 and degree=2. Try gam(y~lo(x,span=0.75,degree=2)) and see if that helps. Kevin
Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
Kevin E. Thorpe wrote:
Ravi Varadhan wrote:
Good try, Kevin. But that doesn't seem to do it. set.seed(123) x <- sort(runif(100)) y <- sin(4*pi*x) + rnorm(100, sd=0.2) ans.lo2 <- loess(y ~ x, degree=2, span=0.75) ans.gam2 <- gam(y ~ lo(x, degree=2, span=0.75)) summary(ans.lo2$fitted - ans.gam2$fitted) # larger differences, about 10% ans.lo1 <- loess(y ~ x, degree=1, span=0.75) ans.gam1 <- gam(y ~ lo(x, degree=1, span=0.75)) summary(ans.lo1$fitted - ans.gam1$fitted) # smaller differences, about 2-5 percent I also tried a number of other things including changing the "family", and parameters in "loess.control", but to no avail. I looked at the Fortran codes from both loess and gam. They are daunting, to say the least. They are dense, and there are absolutely no comments whatsoever. But one thing is clear - they are using different Fortran codes. So, the best bet might be to get Trevor Hastie or Bill Cleveland to help you out. But, before that: why is this an issue, Rolf? Is it important that these two results be identical? Best, Ravi.
There was one other thing I found that I shared with Rolf off-list. In loess.control() there is an iterations argument which is related to the robustness of the estimates. I would think that could also account for tail departures especially. I don't gave the gam package installed, so can't test these myself at the moment.
Somehow when I read the above Ravi, I missed that you had fiddled with loess.contol() AND looked at the Fortran. I guess one simple parameter change may not quite do it. :-) Kevin
----- Original Message ----- From: "Kevin E. Thorpe" <kevin.thorpe at utoronto.ca> Date: Thursday, March 19, 2009 8:23 pm Subject: Re: [R] Difference between gam() and loess(). To: Rolf Turner <r.turner at auckland.ac.nz> Cc: R-help Forum <r-help at r-project.org>
Rolf Turner wrote:
> > It seems that in general > > gam(y~lo(x)) # gam() from the gam package. > > and
> loess(y~x)
> > give slightly different results (in respect of the
predicted/fitted
> values). > Most noticeable at the endpoints of the range of x.
> > Can anyone enlighten me about the reason for this difference? > > Is it possible to twiddle the control parameters, for either or
both > functions,
> so as to obtain identical results?
There are two obvious differences in the defaults. In lo() from the gam package, span=0.5 and degree=1 while for loess(), span=0.75 and degree=2. Try gam(y~lo(x,span=0.75,degree=2)) and see if that helps. Kevin
Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
2 days later
On 21/03/2009, at 3:19 AM, Ravi Varadhan wrote:
<snip>
I also tried a number of other things including changing the "family", and parameters in "loess.control", but to no avail. I looked at the Fortran codes from both loess and gam. They are daunting, to say the least. They are dense, and there are absolutely no comments whatsoever. But one thing is clear - they are using different Fortran codes.
Thanks for doing this digging. It would on this basis be not unreasonable to expect there to be numerical differences in the result.
So, the best bet might be to get Trevor Hastie or Bill Cleveland to help you out. But, before that: why is this an issue, Rolf? Is it important that these two results be identical?
It just makes me nervous when two procedures which I believe to be
doing the
same thing give answers which are not identical. Such a phenomenon
makes me
wonder if there is something which I am not understanding, and if
thereby my
lack of understanding might lead to my making serious errors.
A lack of understanding is an all-to-frequent occurrence in my
circumstances
(i.e. being thick as two short planks) and I therefore need to be
constantly
on my guard against such a lack.
In the current setting I think I will desist from pursuing the issue
any further
and just assume that different coding accounts for the different
results. And
take a valium.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}