How to do a probability density based filtering in 2D?
Hello Roger, Thanks for the suggestions. I finally managed to do it using the output of kde2d - The code is pasted below. Actually this made me realize that the outcome of kde2d can be quite influenced by outliers if a boundary box is not given (try running the code without the boundary box, e.g.lims, and you will see). library(MASS) x1 = rnorm(10000) x2 = rnorm(10000) nBins=200 z1 = kde2d(x1,x2,n=nBins, lims=c(-4,4,-4,4)) x1.cut = cut(x1, seq(z1$x[1], z1$x[nBins],len=(nBins+1) )) x2.cut = cut(x2, seq(z1$y[1], z1$y[nBins],len=(nBins+1) )) xy.cuts = data.frame(x1.cut,x2.cut, ord=1:(length(x1.cut)) ) density = data.frame( X=rep(factor(levels(x1.cut)),rep(nBins,nBins) ), Y=rep(factor(levels(x2.cut)),nBins) , Z= as.vector(z1$z)) xy.density = merge( xy.cuts, density, by=c(1,2), sort=FALSE, all.x=TRUE) xy.density = xy.density[order(x=xy.density$ord),] xy.density$Z[is.na(xy.density$Z)]=0 Z.lim = quantile(xy.density$Z, prob=c(0.1)) to.plot = xy.density$Z > Z.lim[1] par(mfrow=c(1,2)) plot(x1,x2, xlim=c(-3,3) ,ylim=c(-3,3)) plot(x1[to.plot], x2[to.plot], xlim=c(-3,3),ylim=c(-3,3))
On 19 November 2010 21:52, Roger Koenker <rkoenker at illinois.edu> wrote:
Also I wouldn't be surprised if there is a trick with quantile that escapes my mind.
I would be surprised... ?this is all very dependent on how you do the density estimation, but you might look at contourLines and then try to use in.convex.hull() from tripack, but beware that things get more complicated if the density is multi-modal. You might also look at one of the packages that do "data depth" sorts of things. Roger Koenker rkoenker at illinois.edu