Dear R-users,
One of the main reasons I moved from GAUSS to R (as an econometrician) was because of the existence of the library LOCFIT for local polynomial regression. While doing some checking between my former `GAUSS code' and my new `R code', I came to realize LOCFIT is not quite doing what I want. I wrote the following example script:
#-----------------------------------------------------------------------------------------------------------------
# Plain Vanilla NADARAYA-WATSON estimator (or Local Constant regression, e.g. deg=0)
# with gaussian kernel & fixed bandwidth
mkern<-function(y,x,h){
Mx <- matrix(x,nrow=length(y),ncol=length(y),byrow=TRUE)
Mxh <- (1/h)*dnorm((x-Mx)/h)
Myxh<- (1/h)*y*dnorm((x-Mx)/h)
yh <- rowMeans(Myxh)/rowMeans(Mxh)
return(yh)
}
# Generating the design Y=m(x)+e
n <- 10
h <- 0.5
x <- rnorm(n)
y <- x + rnorm(n,mean=0,sd=0.5)
# This is what I really want!
mhat <- mkern(y,x,h)
library(locfit)
yhl.raw <- locfit(y~x,alpha=c(0,h),kern="gauss",ev="data",deg=0,link="ident")
# This is what I get with LOCFIT
print(cbind(x,mhat,residuals(yhl.raw,type="fit"),knots(yhl.raw,what="coef")))
#--------------------------------------------------------------------------------------------------------------------
Questions:
1) Why are residuals(.) & knots(.) results different from one another? If I want m^(x[i]) at each evaluation point i=1,...,n, which one should I use? I do not want interpolation whatsoever.
2) Why are they `close' but not equal to what I want?
I can accept differences for higher degrees and multidimensional data at the boundary of the support (given the way we must do the regression in areas with sparse data) But why are these difference present for deg=0 inside the support as well as at the boundary? The computer would still give us a result even with a close-to-zero random denominator (admittedly, not a reliable one). Unfortunately, I cannot get access to a copy of "Loader, C. (1999) Local Regression and Likelihood, Springer" from my local library, so a small explanation or advice would be greatly appreciated.
I do not mind using an improved version of `what I want', but I would like to understand what am I doing?
Thanks in advanced for your help,
David Jacho-Ch?vez
LOCFIT: What's it doing?
2 messages · Jacho-Chavez,DT (pgr), Miguel A. Arranz
You should definitely read Loader's book. Anyway, in the meantime, you should look an introductory paper that you will find at the Locfit web page. I think that you can set Locfit to estimate at all the sample points, which it does not by default, and also to use a prespecified constant bandwidth, but notice that its definition of the h parameter is not the standard one. Hope this helps, Miguel A.
On Thursday 14 April 2005 10:47, Jacho-Chavez,DT (pgr) wrote:
Dear R-users,
One of the main reasons I moved from GAUSS to R (as an econometrician) was
because of the existence of the library LOCFIT for local polynomial
regression. While doing some checking between my former `GAUSS code' and my
new `R code', I came to realize LOCFIT is not quite doing what I want. I
wrote the following example script:
#--------------------------------------------------------------------------
--------------------------------------- # Plain Vanilla NADARAYA-WATSON
estimator (or Local Constant regression, e.g. deg=0) # with gaussian kernel
& fixed bandwidth
mkern<-function(y,x,h){
Mx <- matrix(x,nrow=length(y),ncol=length(y),byrow=TRUE)
Mxh <- (1/h)*dnorm((x-Mx)/h)
Myxh<- (1/h)*y*dnorm((x-Mx)/h)
yh <- rowMeans(Myxh)/rowMeans(Mxh)
return(yh)
}
# Generating the design Y=m(x)+e
n <- 10
h <- 0.5
x <- rnorm(n)
y <- x + rnorm(n,mean=0,sd=0.5)
# This is what I really want!
mhat <- mkern(y,x,h)
library(locfit)
yhl.raw <-
locfit(y~x,alpha=c(0,h),kern="gauss",ev="data",deg=0,link="ident")
# This is what I get with LOCFIT
print(cbind(x,mhat,residuals(yhl.raw,type="fit"),knots(yhl.raw,what="coef")
))
#--------------------------------------------------------------------------
------------------------------------------
Questions:
1) Why are residuals(.) & knots(.) results different from one another? If I
want m^(x[i]) at each evaluation point i=1,...,n, which one should I use? I
do not want interpolation whatsoever. 2) Why are they `close' but not equal
to what I want?
I can accept differences for higher degrees and multidimensional data at
the boundary of the support (given the way we must do the regression in
areas with sparse data) But why are these difference present for deg=0
inside the support as well as at the boundary? The computer would still
give us a result even with a close-to-zero random denominator (admittedly,
not a reliable one). Unfortunately, I cannot get access to a copy of
"Loader, C. (1999) Local Regression and Likelihood, Springer" from my local
library, so a small explanation or advice would be greatly appreciated.
I do not mind using an improved version of `what I want', but I would like
to understand what am I doing?
Thanks in advanced for your help,
David Jacho-Ch?vez
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html