Skip to content

anomalies with the loess() function

6 messages · Federico Bonofiglio, Jonathan P Daily, Peter Ehlers +1 more

#
On 2010-10-26 11:48, Jonathan P Daily wrote:
I don't think that will work when there are incomplete cases,
in which case 'a' and predict(fit) may not correspond.

I think that it's always best to define a set of predictor
values and use predict() to get the corresponding fits and
plot according to taste:

  fm <- loess(b ~ a)
  aa <- seq(0, 1000, length=101)
  bb <- predict(fm, aa)
  lines(aa, bb, col="blue", lwd=2)

@Federico: see further comments below.
Yes, the 'mess' is due to the unordered nature of your data.
lines() will plot in the order in which the points occur in
your data. You could order before calling loess:

  ord <- order(a)
  a1 <- a[ord]
  b1 <- b[ord]
  fm <- loess(b1 ~ a1)
This is a bad idea. The values of 'a' and 'b' will no longer
be paired. Another reason to prefer dataframes.
There's nothing wrong with loess; it just needs more than a
single intercept and slope to plot its predictions.

   -Peter Ehlers
#
On Tue, 2010-10-26 at 13:29 -0700, Peter Ehlers wrote:
If you change this:
to be this:

fit <- loess(b~a, na.action = na.exclude)

then this:
will work.

G

  
    
#
On Tue, 2010-10-26 at 20:04 +0200, Federico Bonofiglio wrote:
<snip />
<snip />
No, something is wrong with your assumptions. lowess() and loess() do
not return anything like the same object. lowess() returns ordered $x
and $y components. The key here is "ordered".

loess() returns a complex list:
List of 18
 $ n        : int 66
 $ fitted   : num [1:66] 390 447 617 494 283 ...
 $ residuals: Named num [1:66] 270 188 161 -288 104 ...
  ..- attr(*, "names")= chr [1:66] "1" "2" "4" "5" ...
 $ enp      : num 7.87
.... etc.

The fitted values etc are in the order of the data you supplied in a and
b. Hence when you plot these values you will get a cats cradle of lines
because you are asking R to join points in sequence that are *not*
ordered across the plot.
Not sure what you hoped that to do. You are assuming far too much here.
You need to extract the information you want from a loess() object. That
lowess() works here is because it returns components $x and $y and the
underlying plot infrastructure in R will look for this in object it is
supplied. loess() doesn't supply these $x and $y so what you are seeing
is just gibberish because you supplied garbage.

Two alternatives:

a<-sample(c(sample(1:1000,100),rep(NA,50)))
b<-sample(c(sample(1:1000,100),rep(NA,50)))

mod <- loess(b~a, na.action = na.exclude)
fit <- fitted(mod)
ord <- order(a)
plot(a, b)
lines(a[ord], fit[ord], col = "red")

The above respects your fitted values and the locations of missing
values. The next solution will plot the fitted smooth for a set of
regularly spaced values over the range of a

pdata <- data.frame(a = seq(min(a, na.rm = TRUE), 
                            max(a, na.rm = TRUE), length = 100))
pred <- predict(mod, pdata)
plot(a, b)
lines(pdata$a, pred, col = "red")

HTH

G
#
On Wed, 2010-10-27 at 08:26 +0100, Gavin Simpson wrote:
I meant "will work in general". In this particular case, the data are
not ordered in 'a' so it still gives a cradle of lines. The above was
meant to address the "incomplete cases" issue.

G