Skip to content

locfit smoothing question (package maintainer not reachable)

7 messages · Suresh Krishna, David Winsemius, Liaw, Andy

#
Dear list members,

I am trying to understand this output from the smoothing package locfit  
(1.5-4, running on R 2.8.1 on Windows Vista 64 bit).

# sample code

x<-1:100

y<-rnorm(100)
fit<-locfit(y~x,family="gaussian") #default parameters are fine
plot(fit,band="global")  #plot seems "reasonable", confidence bands use a  
global estimate of variance

y<-1000*rnorm(100)
fit<-locfit(y~x,family="gaussian")
plot(fit,band="global") #aren't these confidence bands too small ? am i  
using this function wrongly ?

Using band="local" gives results that seem to make "more sense". Could  
someone offer me some guidance ?

Thanks, Suresh

ps. The package maintainer, Catherine Loader, is no longer reachable at  
her Auckland address.
#
Dear all,

I just realized that using family="qgauss" restores normal-looking  
confidence bands... I read that using family="gaussian" rather than  
family="qgauss" fixes the dispersion parameter at 1, but without knowing  
the theory behind the code, I dont understand why there is such a  
difference between the two. If there is a simple explanation or  
recommendation, I am eager to hear it.

Thanks, Suresh


On Tue, 03 Mar 2009 16:56:43 +0100, Suresh Krishna <madzientist at gmail.com>  
wrote:
#
I think you should read (or re-read)  the locfit help page and *also*  
the links from that page to the help pages for locfit.raw and rv. I  
would have thought that since family= is not an argument to locfit per  
se, but rather is documented in locfit.raw that you have yet done so,  
but perhaps not?
#
From: Suresh Krishna
[...]
For the record, I'm the package maintainer for locfit, and I have not
exactly vanished (yet).  Please see the package description.

That said, it doesn't mean I know all the details about the code.  I
just do enough to keep the package on CRAN.

Best,
Andy
Notice:  This e-mail message, together with any attachme...{{dropped:12}}
#
David Winsemis wrote:

            
I did read the help pages for locfit.raw, and found:

"Local likelihood family; "gaussian"; "binomial"; "poisson"; "gamma" and  
"geom". Density and rate estimation families are "dens", "rate" and  
"hazard" (hazard rate). If the family is preceded by a 'q' (for example,  
family="qbinomial"), quasi-likelihood variance estimates are used.  
Otherwise, the residual variance (rv) is fixed at 1. The default family is  
"qgauss" if a response y is provided; "density" if no response is  
provided. "

However, since the fake data were generated from a known gaussian  
distribution, I did not imagine that using family=gaussian would lead to  
such wildly different results. This is what I was hoping to understand,  
without having to struggle with Catherine's Loader book in order to  
understand the above paragraph deeply enough that this behavior makes  
sense.

Thanks again, Suresh
#
That is what I thought to be the critical paragraph. The variance is  
assumed to be = 1 when you use family="gaussian" rather than the  
default of family="qgauss". You give it a vector, 1000*rnorm(100),  
that ranges widely and a small (relative) variance is assumed and so  
the confidence intervals are plotted as very narrow. This does not  
seem surprising given the functions documented design. I have the book  
and do not think I even need to pull it off the shelf since the help  
pages appear fully informative in this instance. I get an rv of 1 with  
the "gaussian" option and an rv of nearly 1000 when the default is used.
#
On Tue, 03 Mar 2009 22:10:42 +0100, David Winsemius
<dwinsemius at comcast.net> wrote:

            
Thank you, that is helpful. I guess I am wondering under what circumstance  
would it be appropriate to assume that the data had a variance of 1 and  
use the family=gaussian option. Perhaps this is for normalized data ?

Suresh