Difference between gam() and loess(). - R-help

Thu, Mar 19, 2009 2:06 PM #

It seems that in general

	gam(y~lo(x)) # gam() from the gam package.

and
	loess(y~x)

give slightly different results (in respect of the predicted/fitted  
values).
Most noticeable at the endpoints of the range of x.

Can anyone enlighten me about the reason for this difference?

Is it possible to twiddle the control parameters, for either or both  
functions,
so as to obtain identical results?

Thanks.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

Kevin E. Thorpe

Thu, Mar 19, 2009 5:20 PM #

Rolf Turner wrote:

There are two obvious differences in the defaults.  In lo() from the gam 
package, span=0.5 and degree=1 while for loess(), span=0.75 and degree=2.

Try gam(y~lo(x,span=0.75,degree=2)) and see if that helps.

Kevin

Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.6057

Ravi Varadhan

Fri, Mar 20, 2009 7:19 AM #

Good try, Kevin.  But that doesn't seem to do it. 

set.seed(123)

x <- sort(runif(100))

y <- sin(4*pi*x) + rnorm(100, sd=0.2)

ans.lo2 <- loess(y ~ x, degree=2, span=0.75)

ans.gam2 <- gam(y ~ lo(x, degree=2, span=0.75))

summary(ans.lo2$fitted - ans.gam2$fitted) # larger differences, about 10%

ans.lo1 <- loess(y ~ x, degree=1, span=0.75)

ans.gam1 <- gam(y ~ lo(x, degree=1, span=0.75))

summary(ans.lo1$fitted - ans.gam1$fitted) # smaller differences, about 2-5 percent

I also tried a number of other things including changing the "family", and parameters in "loess.control", but to no avail.  I looked at the Fortran codes from both loess and gam.  They are daunting, to say the least. They are dense, and there are absolutely no comments whatsoever.  But one thing is clear - they are using different Fortran codes.

So, the best bet might be to get Trevor Hastie or Bill Cleveland to help you out.  

But, before that:  why is this an issue, Rolf?  Is it important that these two results be identical?

Best,
Ravi.



____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: "Kevin E. Thorpe" <kevin.thorpe at utoronto.ca>
Date: Thursday, March 19, 2009 8:23 pm
Subject: Re: [R] Difference between gam() and loess().
To: Rolf Turner <r.turner at auckland.ac.nz>
Cc: R-help Forum <r-help at r-project.org>

 ______________________________________________
 R-help at r-project.org mailing list
 
 PLEASE do read the posting guide 
 and provide commented, minimal, self-contained, reproducible code.

Kevin E. Thorpe

Fri, Mar 20, 2009 5:31 PM #

Ravi Varadhan wrote:

There was one other thing I found that I shared with Rolf off-list.
In loess.control() there is an iterations argument which is related
to the robustness of the estimates.  I would think that could also
account for tail departures especially.

I don't gave the gam package installed, so can't test these myself
at the moment.

Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.6057

Kevin E. Thorpe

Fri, Mar 20, 2009 5:49 PM #

Kevin E. Thorpe wrote:

Somehow when I read the above Ravi, I missed that you had fiddled with 
loess.contol() AND looked at the Fortran.

I guess one simple parameter change may not quite do it. :-)

Kevin

Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.6057

Rolf Turner

Mon, Mar 23, 2009 1:44 PM #

On 21/03/2009, at 3:19 AM, Ravi Varadhan wrote:

<snip>

Thanks for doing this digging.  It would on this basis be not  
unreasonable
	to expect there to be numerical differences in the result.

It just makes me nervous when two procedures which I believe to be  
doing the
	same thing give answers which are not identical.  Such a phenomenon  
makes me
	wonder if there is something which I am not understanding, and if  
thereby my
	lack of understanding might lead to my making serious errors.

	A lack of understanding is an all-to-frequent occurrence in my  
circumstances
	(i.e. being thick as two short planks) and I therefore need to be  
constantly
	on my guard against such a lack.

	In the current setting I think I will desist from pursuing the issue  
any further
	and just assume that different coding accounts for the different  
results.  And
	take a valium.

		cheers,

			Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}