Skip to content

How to calculate the distance between two density functions

4 messages · Chang Jia-Ming, David Winsemius, Lucke, Joseph F +1 more

#
A similar question was posed and answered:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/119793.html

Two aspects needed to be addressed ... specifying the same domain, and  
getting the x-values to "line up" prior to the subtraction (or  
whatever function is desired).

What are you going to do when the two functions cross?

  d1 <- dnorm(seq(-2,2,by=.1))
  d2 <- dnorm(seq(-2,2,by=.1), mean=2)
  plot(seq(-2,2,by=.1),d1)
  lines(seq(-2,2,by=.1),d2)

---- or----

  d4 <- dnorm(seq(-4,4,by=.1))
  d5 <- dnorm(seq(-4,4,by=.1), sd=5)
  plot(seq(-4,4,by=.1),d4)
  lines(seq(-4,4,by=.1),d5)
#
In general, comparing two continuous densities is difficult because they
can differ on a set of measure 0 (i.e., at a single point) and yet have
the same distribution function. 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Chang Jia-Ming
Sent: Friday, December 05, 2008 8:00 AM
To: r-help at r-project.org
Subject: [R] How to calculate the distance between two density functions

Dear all,

  I wrote the following code to calculate the density functions for two
data sets, respectively.

  den_str <-density(str_data$Similarity);
  den_non_str <-density(nonstr_data$Similarity);

  However, I would like to knowing the difference between den_str and
den_non_str, that is, the difference between the region under the curve
of the den_str and the region under the curve of the den_non_str.

 How to do?

 Thank you for help.

Jia-Ming


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
On Fri, Dec 5, 2008 at 3:59 PM, Chang Jia-Ming <chang.jiaming at crg.es> wrote:
One way of calculating the difference between two density functions
(or more general histograms), is the Earth Movers Distance
(e.g.http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/RUBNER/emd.htm
or http://en.wikipedia.org/wiki/Earth_Mover's_Distance ).
Dirk Eddelbuettel and myself are finalizing an implementation of it
and it will be available as soon as some licensing issues are sorted
out, which will be hopefully rather soon. If you don't want to wait
till the release, please drop either Dirk or myself an email and we
could mail you the package. As I said, the implementation is working
(I am using it in a research project at the moment), it is just that
the license is at the moment nonprofit research only.

Rainer