strange behavior of loess() & predict()

_______________________________________________________________________________________

The problem appears to be in how your original data has several tied values:
table(x)
x
1.8   2 2.2 2.4 2.6 2.8   3 3.2 3.4 3.6   4 
  1   2   2   2   5   7   2   3   1   2   1

IIRC the maths and programming behind loess assume unique values for the predictor.

One way to get around this is to jitter your data:
x2 <- jitter(x)
modj <- loess(y ~ x2, span=.5, degree=1)
predict(modj, data.frame(x=X))
[1] 3.156192 3.141705 3.126918 3.112996 3.101108 3.087696 3.063471 3.038609 3.024639 3.032585 3.059480 3.091774
[13] 3.115763 3.117743 3.092979 3.040798 2.988283 2.957976 2.950648 3.008358 3.070065 3.127379 3.193501 3.149428
[25] 3.082843 3.010998 2.939407 2.888213 2.841487 2.812815 2.801583 2.807181 2.837887 2.899130 2.978165 3.062088
[37] 3.137995 3.204628 3.271813 3.339450 3.407396 3.475510 3.543843 3.612450 3.681267 3.750227 3.819267 3.888321
[49] 3.957324 4.026212

Another way is to summarise your data using table() and aggregate(), and fit a weighted model where the weights are the counts for each unique x-value:
dtab <-  aggregate(data.frame(y=y), by=list(x=x), FUN=mean)
dtab$x <- as.numeric(as.character(dtab$x))
dtab$w <- table(x)
modt <- loess(y ~ x, span=.5, degree=1, weights=w, data=dtab)
predict(modt, data.frame(x=X))
[1] 3.186959 3.163133 3.136244 3.110822 3.091396 3.076705 3.047705 3.018362 3.007143 3.032246 3.069599 3.092369
[13] 3.098049 3.084134 3.053633 3.027429 3.012429 3.013908 3.036517 3.060372 3.076116 3.086870 3.095758 3.097287
[25] 3.073824 3.031238 2.976659 2.917402 2.863489 2.821469 2.796398 2.793336 2.823850 2.892363 2.980322 3.068725
[37] 3.140843 3.208920 3.279124 3.351965 3.427952 3.504330 3.577149 3.647119 3.714984 3.781486 3.847369 3.913375
[49] 3.980249 4.048733

There's probably a way to make the aggregate and table calls neater.

-- 
Hong Ooi
Senior Research Analyst, IAG Limited
388 George St, Sydney NSW 2000
+61 (2) 9292 1566
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leo G??rtler
Sent: Wednesday, 7 December 2005 8:10 AM
To: r-help at stat.math.ethz.ch
Cc: gavin.simpson at ucl.ac.uk
Subject: Re: [R] strange behavior of loess() & predict()
Dear list,

I am very sorry for being inaccurate in my question. But re-reading the 
predict.loess help site does not provide a solution. As long as predict 
is used on a new dataset based on this dataset, the strange values 
remain and can be reproduced.
Adding a new element to both vectors (at the beginning, e.g. "1" for 
each vector) results in plausible values - but not in every case.
Even switching x and y is sufficient (i.e. x as predictor and y as 
dependent variable). So my question is:

Is it normal - or under which conditions does it take place - that 
predict.loess predicts values that are almost 20000/max(y) ~ 5000 times 
higher than expected?

best,

leo g??rtler
On Tue, 2005-12-06 at 18:09 +0100, Leo G??rtler wrote:

Dear altogether,

<snip>

# here is the difference!!
predict(mod, data.frame(x=X), se=TRUE)
predict(mod, x=X, se=TRUE)

<--- end of snip --->

I assume this has some reason but I do not understand this reason.
Merci,

Not sure if this is the reason, but there is no argument x in
predict.loess, and:

a <- predict(mod, se = TRUE)

gives you the same results as:

b <- predict(mod, x=X, se=TRUE)

so the x argument appears to be being passed on/in the ... arguments and
ignored? As such, you have no newdata, so mod$x is used.

Now, when you do:

c <- predict(mod, data.frame(x=X), se=TRUE)

You have used an un-named argument in position 2. R takes this to be
what you want to use for newdata and so works with this data rather than
the one in mod$x as in the first case:

# now named second argument - gets ignored as in a and b
d <- predict(mod, x = data.frame(x=X), se=TRUE)

all.equal(a, b) # TRUE
all.equal(a, c) # FALSE
all.equal(a, d) # TRUE

# this time we assign X to x by using (), the result is used as newdata
e <-  predict(mod, (x=X), se=TRUE)

all.equal(c, e) # TRUE

If in doubt, name your arguments and check the help! ?predict.loess
would have quickly shown you where the problem lay.

HTH

G

best regards

leo g??rtler

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

email: leog at anicca-vijja.de
www: http://www.anicca-vijja.de/

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

_______________________________________________________________________________________

The information transmitted in this message and its attachme...{{dropped}}