Skip to content
Prev 164087 / 398506 Next

Multivariate kernel density estimation

Possible, yes (see fortune('Yoda')), but doing it can be a bit difficult.

Here is one approximation using an independent normal kernel in 2 dimensions:

kde.contour <- function(x,y=NULL, conf=0.95, xdiv=100, ydiv=100, kernel.sd ) {
        xy <- xy.coords(x,y)
        xr <- range(xy$x)
        yr <- range(xy$y)

        xr <- xr + c(-1,1)*0.1*diff(xr)
        yr <- yr + c(-1,1)*0.1*diff(yr)

        if(missing(kernel.sd)) {
                kernel.sd <- c( diff(xr)/6, diff(yr)/6 )
        } else if (length(kernel.sd)==1) {
                kernel.sd <- rep(kernel.sd, 2)
        }


        xs <- seq(xr[1], xr[2], length.out=xdiv)
        ys <- seq(yr[1], yr[2], length.out=ydiv)
        mydf <- expand.grid( xx=xs, yy=ys )

        tmpfun <- function(xx,yy) {
                sum( dnorm(xx, xy$x, kernel.sd[1]) * dnorm(yy, xy$y, kernel.sd[2]) )
        }

        z <- mapply(tmpfun, xx=mydf$xx, yy=mydf$yy)

        sz <- sort(z, decreasing=TRUE)
        cz <- cumsum(sz)
        cz <- cz/cz[length(cz)]

        cutoff <- sz[ which( cz > conf )[1] ]

        plot(xy, xlab='x', ylab='y', xlim=xr, ylim=yr)
        #contour( xs, ys, matrix(z, nrow=xdiv), add=TRUE, col='blue')
        contour( xs, ys, matrix(z, nrow=xdiv), add=TRUE, col='red',
                levels=cutoff, labels='')

        invisible(NULL)
}

# test
kde.contour( rnorm(100), rnorm(100) )

# correlated data
my.xy <- MASS::mvrnorm(100, c(3,10), matrix( c(1,.8,.8,1), 2) )

kde.contour( my.xy, kernel.sd=.5 )

# compare to theoritical
lines(ellipse::ellipse( 0.8, scale=c(1,1), centre=c(3,10)), col='green')


# bimodal

new.xy <- rbind( MASS::mvrnorm(65, c(3,10), matrix( c(1,.6,.6,1),2) ),
                        MASS::mvrnorm(35, c(6, 7), matrix( c(1,.6,.6,1), 2) ) )

kde.contour( new.xy, kernel.sd=.75 )


For more than 2 dimensions it becomes more difficult (both to visualize and to find the region, contour only works in 2 dimensions).  You can also see that the approximations are not great compared to the true theory, but possibly a better kernel would improve that.

Hope this helps,


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111