Skip to content

Smoothing z-values according to their x, y positions

7 messages · Bert Gunter, Emmanuel Levy, David Winsemius

#
Dear All,

I'm sure this is not the first time this question comes up but I
couldn't find the keywords that would point me out to it - so
apologies if this is a re-post.

Basically I've got thousands of points, each depending on three variables:
x, y, and z.

if I do a plot(x,y, col=z), I get something very messy.

So I would like to smooth the values of z according to the values of
their neighbors (given by x and y).

Could you please point me out to the function(s) I should look into
for the smothing, and
possibly for the plotting?

Many thanks for your help,

Emmanuel
#
There are dozens of functions within R and contributed packages that do
this. The spatial statistics packages (see the "spatial" task view on CRAN)
is certainly where most are concentrated, but thin plate splines (in splines
packages, I believe), tensor product splines (in mgcv package), local
likelihood (in locfit package) etc. are also available for spatial
smoothing.  Perhaps the most basic place to look is the ?loess function in
the base distribution, which will do exactly what you requested.


Bert Gunter
Genentech Nonclinical Statistics


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Emmanuel Levy
Sent: Wednesday, March 19, 2008 12:42 PM
To: r-help at r-project.org
Subject: [R] Smoothing z-values according to their x, y positions

Dear All,

I'm sure this is not the first time this question comes up but I
couldn't find the keywords that would point me out to it - so
apologies if this is a re-post.

Basically I've got thousands of points, each depending on three variables:
x, y, and z.

if I do a plot(x,y, col=z), I get something very messy.

So I would like to smooth the values of z according to the values of
their neighbors (given by x and y).

Could you please point me out to the function(s) I should look into
for the smothing, and
possibly for the plotting?

Many thanks for your help,

Emmanuel

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Dear Bert,

Thanks for your reply - I indeed saw a lot of functions using:
help.search("smooth")

The problem is that most seem to not be very appropriate to what I'd like, or
they seem extremely complicated (e.g. gma). I am probably missing
something as I don't
see how to use Loess. From my poor understanding, it seems to be for 2D data.
Here I want to smooth the third "z" component.

In the meantime I adapted a function from Greg Warnes, which sort of
does the job (although the smoothing is not very nice).
So I'd be very happy to learn how do do something similar with loess!

x = runif(1000)
y = runif(1000)
z = x+y
hist2d.el(x,y,z, nbins=10)

## -- This function was adapted from Greg Warnes
hist2d.el <- function( x,y=NULL, z=NULL, nbins=200, my.cuts = NULL,
same.scale=FALSE, na.rm=TRUE, show=TRUE, col=c("black",
heat.colors(12)), ... )
  {
    if(is.null(y))
      {
        if(ncol(x) != 2) stop("If y is ommitted, x must be a 2 column matirx")
        y <- x[,2]
        x <- x[,1]
      }

    if(length(nbins)==1)
      nbins <- rep(nbins,2)

    nas <- is.na(x) | is.na(y)

    if(na.rm)
      {
        x <- x[!nas]
        y <- y[!nas]
      }
    else
      stop("missinig values not permitted if na.rm=FALSE")

    if(!is.null(my.cuts)){

      nbins = rep(length(my.cuts)-1,2)
      x.cuts=my.cuts
      y.cuts=my.cuts

    } else {

      if(same.scale)
        {
          x.cuts <- seq( from=min(x,y), to=max(x,y), length=nbins[1]+1)
          y.cuts <- seq( from=min(x,y), to=max(x,y), length=nbins[2]+1)
        }
      else
        {
          x.cuts <- seq( from=min(x), to=max(x), length=nbins[1]+1)
          y.cuts <- seq( from=min(y), to=max(y), length=nbins[2]+1)
        }
    }

    print(nbins)

    index.x <- cut( x, x.cuts, include.lowest=TRUE)
    index.y <- cut( y, y.cuts, include.lowest=TRUE)

    m <- matrix( 0, nrow=nbins[1], ncol=nbins[2],
                dimnames=list( levels(index.x),
                               levels(index.y) ) )

    m.sum <- matrix( 0, nrow=nbins[1], ncol=nbins[2],
                dimnames=list( levels(index.x),
                               levels(index.y) ) )

    for( i in 1:length(index.x) )
      m[ index.x[i], index.y[i] ] <-  m[ index.x[i], index.y[i] ] + 1

    for( i in 1:length(index.x) )
      m.sum[ index.x[i], index.y[i] ] <-  m.sum[ index.x[i], index.y[i] ] + z[i]

    m[which(m==0)]=1
    m = m.sum/(m)

    xvals <- x.cuts[1:nbins[1]]
    yvals <- y.cuts[1:nbins[2]]

    if(show)
      image( xvals,yvals, m, col=col,...)

                                        #
invisible(list(counts=m,x=xvals,y=yvals))
    invisible(m)
  }
On 19/03/2008, Bert Gunter <gunter.berton at gene.com> wrote:
#
"Emmanuel Levy" <emmanuel.levy at gmail.com> wrote in
news:e4654710803191554k4dfc2728g70bde35f41627754 at mail.gmail.com:
You could consider at combining the kde2d() function in the MASS package 
within the VR bundle in combination with contour(). The example I 
followed came from MASS 2ed in chapter 5's section on density estimation. 
It is probably easiest to just copy the help example:

library(MASS)
attach(geyser)
f1 <- kde2d(duration[-272], duration[-1],
             h = rep(1.5, 2), n = 50, lims = c(0.5, 6, 0.5, 6))
contour(f1, xlab = "previous duration",
        ylab = "duration", levels  =  c(0.05, 0.1, 0.2, 0.4) )

I have also had success with Loader's locfit package.
#
"Emmanuel Levy" <emmanuel.levy at gmail.com> wrote in
news:e4654710803191554k4dfc2728g70bde35f41627754 at mail.gmail.com:
I am realizing that my memories about the kde2d function were faulty. My 
suggestion would only let you look at the 2-d "marginals". You probably 
want a true 3d plotting function. The densities would then be closed 
surfaces.

Take a look at Feng and Tierney's figure 10 in this poster at:
<http://user2007.org/program/posters/feng.pdf>

The code cannot be cut-and-pasted, but if it does what you want, it would 
be worth it.
#
Dear David,

Thanks a lot for pointing out kde2d, just tried it out but the problem is that
it indeed takes the density of points into account, which I dont want.

For example, if in an region of surface S I've got 10,000 points, and that their
average height is 0.5, and in an other region I've got only ten points and their
average value if also 0.5, I'd like all these points to be transformed
to the same
0.5 value. At the moment, it seems that it's not the case.

For example, the range of the values I give is:  0.2 - 3.7, but the
range of the values
that are outputed is 0 - 0.17.

I haven't looked yet at the locfit package as it is not installed, but
I will check it out!

Thanks for helping!

Emmanuel
On 20/03/2008, David Winsemius <dwinsemius at comcast.net> wrote:
#
"Emmanuel Levy" <emmanuel.levy at gmail.com> wrote in
news:e4654710803191929j554c211bla7088bc8bc69c445 at mail.gmail.com:
Your description of what is desired leads me to believe you were
misreading the "loess" documentation. For proof I suggest you visit
Deepayan Sarkar's webpage and in particular see Figure 6.8 where 3d
plots of loess fits for more complex data arrangements are
exemplified:
Choose Figure 6.8 (code on right side of window):  
<http://lmdvr.r-forge.r-project.org/figures/figures.html>

Code:
<http://dsarkar.fhcrc.org/lattice/book/Chapter06-Trivariate/all.R>

PNG image:
<http://dsarkar.fhcrc.org/lattice/book/images/Figure_06_08_stdClassic.png>