strange behavior of loess() & predict()
_______________________________________________________________________________________ The problem appears to be in how your original data has several tied values:
table(x)
x 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 4 1 2 2 2 5 7 2 3 1 2 1 IIRC the maths and programming behind loess assume unique values for the predictor. One way to get around this is to jitter your data:
x2 <- jitter(x) modj <- loess(y ~ x2, span=.5, degree=1) predict(modj, data.frame(x=X))
[1] 3.156192 3.141705 3.126918 3.112996 3.101108 3.087696 3.063471 3.038609 3.024639 3.032585 3.059480 3.091774 [13] 3.115763 3.117743 3.092979 3.040798 2.988283 2.957976 2.950648 3.008358 3.070065 3.127379 3.193501 3.149428 [25] 3.082843 3.010998 2.939407 2.888213 2.841487 2.812815 2.801583 2.807181 2.837887 2.899130 2.978165 3.062088 [37] 3.137995 3.204628 3.271813 3.339450 3.407396 3.475510 3.543843 3.612450 3.681267 3.750227 3.819267 3.888321 [49] 3.957324 4.026212 Another way is to summarise your data using table() and aggregate(), and fit a weighted model where the weights are the counts for each unique x-value:
dtab <- aggregate(data.frame(y=y), by=list(x=x), FUN=mean) dtab$x <- as.numeric(as.character(dtab$x)) dtab$w <- table(x) modt <- loess(y ~ x, span=.5, degree=1, weights=w, data=dtab) predict(modt, data.frame(x=X))
[1] 3.186959 3.163133 3.136244 3.110822 3.091396 3.076705 3.047705 3.018362 3.007143 3.032246 3.069599 3.092369 [13] 3.098049 3.084134 3.053633 3.027429 3.012429 3.013908 3.036517 3.060372 3.076116 3.086870 3.095758 3.097287 [25] 3.073824 3.031238 2.976659 2.917402 2.863489 2.821469 2.796398 2.793336 2.823850 2.892363 2.980322 3.068725 [37] 3.140843 3.208920 3.279124 3.351965 3.427952 3.504330 3.577149 3.647119 3.714984 3.781486 3.847369 3.913375 [49] 3.980249 4.048733 There's probably a way to make the aggregate and table calls neater. -- Hong Ooi Senior Research Analyst, IAG Limited 388 George St, Sydney NSW 2000 +61 (2) 9292 1566 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leo G??rtler Sent: Wednesday, 7 December 2005 8:10 AM To: r-help at stat.math.ethz.ch Cc: gavin.simpson at ucl.ac.uk Subject: Re: [R] strange behavior of loess() & predict()
Gavin Simpson wrote:
Dear list, I am very sorry for being inaccurate in my question. But re-reading the predict.loess help site does not provide a solution. As long as predict is used on a new dataset based on this dataset, the strange values remain and can be reproduced. Adding a new element to both vectors (at the beginning, e.g. "1" for each vector) results in plausible values - but not in every case. Even switching x and y is sufficient (i.e. x as predictor and y as dependent variable). So my question is: Is it normal - or under which conditions does it take place - that predict.loess predicts values that are almost 20000/max(y) ~ 5000 times higher than expected? best, leo g??rtler
On Tue, 2005-12-06 at 18:09 +0100, Leo G??rtler wrote:
Dear altogether,
<snip>
# here is the difference!! predict(mod, data.frame(x=X), se=TRUE) predict(mod, x=X, se=TRUE) <--- end of snip ---> I assume this has some reason but I do not understand this reason. Merci,
Not sure if this is the reason, but there is no argument x in predict.loess, and: a <- predict(mod, se = TRUE) gives you the same results as: b <- predict(mod, x=X, se=TRUE) so the x argument appears to be being passed on/in the ... arguments and ignored? As such, you have no newdata, so mod$x is used. Now, when you do: c <- predict(mod, data.frame(x=X), se=TRUE) You have used an un-named argument in position 2. R takes this to be what you want to use for newdata and so works with this data rather than the one in mod$x as in the first case: # now named second argument - gets ignored as in a and b d <- predict(mod, x = data.frame(x=X), se=TRUE) all.equal(a, b) # TRUE all.equal(a, c) # FALSE all.equal(a, d) # TRUE # this time we assign X to x by using (), the result is used as newdata e <- predict(mod, (x=X), se=TRUE) all.equal(c, e) # TRUE If in doubt, name your arguments and check the help! ?predict.loess would have quickly shown you where the problem lay. HTH G
best regards leo g??rtler
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
email: leog at anicca-vijja.de www: http://www.anicca-vijja.de/ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html _______________________________________________________________________________________ The information transmitted in this message and its attachme...{{dropped}}