Skip to content

LOCFIT: What's it doing?

2 messages · Jacho-Chavez,DT (pgr), Miguel A. Arranz

#
Dear R-users,

One of the main reasons I moved from GAUSS to R (as an econometrician) was because of the existence of the library LOCFIT for local polynomial regression. While doing some checking between my former `GAUSS code' and my new `R code', I came to realize LOCFIT is not quite doing what I want. I wrote the following example script:

#-----------------------------------------------------------------------------------------------------------------
# Plain Vanilla NADARAYA-WATSON estimator (or Local Constant regression, e.g. deg=0)
# with gaussian kernel & fixed bandwidth

mkern<-function(y,x,h){
Mx <- matrix(x,nrow=length(y),ncol=length(y),byrow=TRUE)
Mxh <- (1/h)*dnorm((x-Mx)/h)
Myxh<- (1/h)*y*dnorm((x-Mx)/h)
yh <- rowMeans(Myxh)/rowMeans(Mxh)
return(yh)
}

# Generating the design Y=m(x)+e
n <- 10
h <- 0.5
x <- rnorm(n)
y <- x + rnorm(n,mean=0,sd=0.5)

# This is what I really want!
mhat <- mkern(y,x,h)

library(locfit)
yhl.raw <- locfit(y~x,alpha=c(0,h),kern="gauss",ev="data",deg=0,link="ident")

# This is what I get with LOCFIT
print(cbind(x,mhat,residuals(yhl.raw,type="fit"),knots(yhl.raw,what="coef")))
#--------------------------------------------------------------------------------------------------------------------

Questions:
1) Why are residuals(.) & knots(.) results different from one another? If I want m^(x[i]) at each evaluation point i=1,...,n, which one should I use? I do not want interpolation whatsoever.
2) Why are they `close' but not equal to what I want?

I can accept differences for higher degrees and multidimensional data at the boundary of the support (given the way we must do the regression in areas with sparse data) But why are these difference present for deg=0 inside the support as well as at the boundary? The computer would still give us a result even with a close-to-zero random denominator (admittedly, not a reliable one). Unfortunately, I cannot get access to a copy of "Loader, C. (1999) Local Regression and Likelihood, Springer" from my local library, so a small explanation or advice would be greatly appreciated.

I do not mind using an improved version of `what I want', but I would like to understand what am I doing?


Thanks in advanced for your help,


David Jacho-Ch?vez
#
You should definitely read Loader's book. Anyway, in the meantime, you should 
look an introductory paper that you will find at the Locfit web page. I think 
that you can set Locfit to estimate at all the sample points, which it does 
not by default, and also to use a prespecified constant bandwidth, but notice 
that its definition of the h parameter is not the standard one.

Hope this helps,

Miguel A.
On Thursday 14 April 2005 10:47, Jacho-Chavez,DT (pgr) wrote: