Dear list members, I am trying to understand this output from the smoothing package locfit (1.5-4, running on R 2.8.1 on Windows Vista 64 bit). # sample code x<-1:100 y<-rnorm(100) fit<-locfit(y~x,family="gaussian") #default parameters are fine plot(fit,band="global") #plot seems "reasonable", confidence bands use a global estimate of variance y<-1000*rnorm(100) fit<-locfit(y~x,family="gaussian") plot(fit,band="global") #aren't these confidence bands too small ? am i using this function wrongly ? Using band="local" gives results that seem to make "more sense". Could someone offer me some guidance ? Thanks, Suresh ps. The package maintainer, Catherine Loader, is no longer reachable at her Auckland address.
locfit smoothing question (package maintainer not reachable)
7 messages · Suresh Krishna, David Winsemius, Liaw, Andy
Dear all, I just realized that using family="qgauss" restores normal-looking confidence bands... I read that using family="gaussian" rather than family="qgauss" fixes the dispersion parameter at 1, but without knowing the theory behind the code, I dont understand why there is such a difference between the two. If there is a simple explanation or recommendation, I am eager to hear it. Thanks, Suresh On Tue, 03 Mar 2009 16:56:43 +0100, Suresh Krishna <madzientist at gmail.com> wrote:
Dear list members, I am trying to understand this output from the smoothing package locfit (1.5-4, running on R 2.8.1 on Windows Vista 64 bit). # sample code x<-1:100 y<-rnorm(100) fit<-locfit(y~x,family="gaussian") #default parameters are fine plot(fit,band="global") #plot seems "reasonable", confidence bands use a global estimate of variance y<-1000*rnorm(100) fit<-locfit(y~x,family="gaussian") plot(fit,band="global") #aren't these confidence bands too small ? am i using this function wrongly ? Using band="local" gives results that seem to make "more sense". Could someone offer me some guidance ? Thanks, Suresh ps. The package maintainer, Catherine Loader, is no longer reachable at her Auckland address.
I think you should read (or re-read) the locfit help page and *also* the links from that page to the help pages for locfit.raw and rv. I would have thought that since family= is not an argument to locfit per se, but rather is documented in locfit.raw that you have yet done so, but perhaps not?
David Winsemis On Mar 3, 2009, at 12:39 PM, Suresh Krishna wrote: > > Dear all, > > I just realized that using family="qgauss" restores normal-looking > confidence bands... I read that using family="gaussian" rather than > family="qgauss" fixes the dispersion parameter at 1, but without > knowing the theory behind the code, I dont understand why there is > such a difference between the two. If there is a simple explanation > or recommendation, I am eager to hear it. > > Thanks, Suresh > > > On Tue, 03 Mar 2009 16:56:43 +0100, Suresh Krishna <madzientist at gmail.com > > wrote: > >> >> Dear list members, >> >> I am trying to understand this output from the smoothing package >> locfit (1.5-4, running on R 2.8.1 on Windows Vista 64 bit). >> >> # sample code >> >> x<-1:100 >> >> y<-rnorm(100) >> fit<-locfit(y~x,family="gaussian") #default parameters are fine >> plot(fit,band="global") #plot seems "reasonable", confidence bands >> use a global estimate of variance >> >> y<-1000*rnorm(100) >> fit<-locfit(y~x,family="gaussian") >> plot(fit,band="global") #aren't these confidence bands too small ? >> am i using this function wrongly ? >> >> Using band="local" gives results that seem to make "more sense". >> Could someone offer me some guidance ? >> >> Thanks, Suresh >> >> ps. The package maintainer, Catherine Loader, is no longer >> reachable at her Auckland address. >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
From: Suresh Krishna [...]
ps. The package maintainer, Catherine Loader, is no longer reachable at her Auckland address.
For the record, I'm the package maintainer for locfit, and I have not
exactly vanished (yet). Please see the package description.
That said, it doesn't mean I know all the details about the code. I
just do enough to keep the package on CRAN.
Best,
Andy
Notice: This e-mail message, together with any attachme...{{dropped:12}}
David Winsemis wrote:
I think you should read (or re-read) the locfit help page and *also* the links from that page to the help pages for locfit.raw and rv. I would have thought that since family= is not an argument to locfit per se, but rather is documented in locfit.raw that you have yet done so, but perhaps not?
I did read the help pages for locfit.raw, and found: "Local likelihood family; "gaussian"; "binomial"; "poisson"; "gamma" and "geom". Density and rate estimation families are "dens", "rate" and "hazard" (hazard rate). If the family is preceded by a 'q' (for example, family="qbinomial"), quasi-likelihood variance estimates are used. Otherwise, the residual variance (rv) is fixed at 1. The default family is "qgauss" if a response y is provided; "density" if no response is provided. " However, since the fake data were generated from a known gaussian distribution, I did not imagine that using family=gaussian would lead to such wildly different results. This is what I was hoping to understand, without having to struggle with Catherine's Loader book in order to understand the above paragraph deeply enough that this behavior makes sense. Thanks again, Suresh
That is what I thought to be the critical paragraph. The variance is assumed to be = 1 when you use family="gaussian" rather than the default of family="qgauss". You give it a vector, 1000*rnorm(100), that ranges widely and a small (relative) variance is assumed and so the confidence intervals are plotted as very narrow. This does not seem surprising given the functions documented design. I have the book and do not think I even need to pull it off the shelf since the help pages appear fully informative in this instance. I get an rv of 1 with the "gaussian" option and an rv of nearly 1000 when the default is used.
David Winsemius On Mar 3, 2009, at 3:26 PM, Suresh Krishna wrote: > > David Winsemis wrote: > >> I think you should read (or re-read) the locfit help page and >> *also* the links from that page to the help pages for locfit.raw >> and rv. I would have thought that since family= is not an argument >> to locfit per se, but rather is documented in locfit.raw that you >> have yet done so, but perhaps not? > > I did read the help pages for locfit.raw, and found: > > "Local likelihood family; "gaussian"; "binomial"; "poisson"; "gamma" > and "geom". Density and rate estimation families are "dens", "rate" > and "hazard" (hazard rate). If the family is preceded by a 'q' (for > example, family="qbinomial"), quasi-likelihood variance estimates > are used. Otherwise, the residual variance (rv) is fixed at 1. The > default family is "qgauss" if a response y is provided; "density" if > no response is provided. " > > However, since the fake data were generated from a known gaussian > distribution, I did not imagine that using family=gaussian would > lead to such wildly different results. This is what I was hoping to > understand, without having to struggle with Catherine's Loader book > in order to understand the above paragraph deeply enough that this > behavior makes sense. > > Thanks again, Suresh > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Tue, 03 Mar 2009 22:10:42 +0100, David Winsemius
<dwinsemius at comcast.net> wrote:
That is what I thought to be the critical paragraph. The variance is assumed to be = 1 when you use family="gaussian" rather than the default of family="qgauss". You give it a vector, 1000*rnorm(100), that ranges widely and a small (relative) variance is assumed and so the confidence intervals are plotted as very narrow. This does not seem surprising given the functions documented design. I have the book and do not think I even need to pull it off the shelf since the help pages appear fully informative in this instance. I get an rv of 1 with the "gaussian" option and an rv of nearly 1000 when the default is used.
Thank you, that is helpful. I guess I am wondering under what circumstance would it be appropriate to assume that the data had a variance of 1 and use the family=gaussian option. Perhaps this is for normalized data ? Suresh