Skip to content

MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

3 messages · Khulood Aljehani, Bert Gunter, Peter Dalgaard

#
Hello
I hope that you will help me in my problem with the Nadaraya-Watson kernel regression estimation method (NW)  I used a simulation data and made a loop ??to calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E where, Y: the response variable,       X: the explanatory variable from uniform (0,1)       E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE  But the MSE increases with increasing the sample size, and this is my program that i wrote it
n1=25
set.seed(4455)
E<-rnorm(n1,mean=0,sd=0.1)
X<-runif(n1, min = 0, max = 1)
mx=1-X+exp(-200*(X-0.5)^2)
Y <- mx+E
nrep <- 1000

#----------------------------------------Fixed NW
mse_rep1<-c()
for(i in 1:1500){
set.seed(i+236)
E<-rnorm(n1,mean=0,sd=0.1)
X<-runif(n1, min = 0, max = 1)
mx=1-X+exp(-200*(X-0.5)^2)
Y <- mx+E
hmax <- 2 * sqrt(var(X)) * n1^(-1/5) 
lower = 0.01 * hmax              
h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower)
est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y
mse1<-(n1^-1)*sum((Y - est1)^2)

mse_rep1 <- cbind(mse_rep1,mse1)

dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i))

}
library(functional)
MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))]

MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean)     #calculate the average of the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000 without NA value
When i change the sample size to 50 or 100 the MSE decrease , but more than 100 the MSE increas. this is the main problem.
I hope I was able to clarify the problem well
Regards
#
1. I am unfamiliar with the functional package.

2. I think the proper question is: Why do you expect the mse to
decrease with decreasing sample size?
Example: the precision of an average (as an estimator of the
population mean) increases (gets smaller) as sample size increases,
but the mse is essentially constant as an estimator of the population
variance.
Note: for nonparametric smoothers, mse is related to bandwidth choice
also. This might change by default with different sample sizes.

3. In future, please post in plain text, not html, as the posting
guide requests.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll
On Sat, Nov 1, 2014 at 5:03 AM, Khulood Aljehani <aljehani-k at hotmail.com> wrote:
#
You seem to be using bw.ucv to set the bandwidth for ksmooth. However, bw.ucv selects the bandwidth for estimating the _density_ of x. I see no reason to believe that the same bandwidth selection should be optimal or even consistent for a kernel smoother like ksmooth. 

Check out the KernSmooth package, in particular the dpik() and dpill() function and the book that the package supports.

-pd