Skip to content

Density function: Area under density plot is not equal to 1. Why?

5 messages · Gonçalo Ferraz, Jean-Christophe BOUËTTÉ, Greg Snow +2 more

#
Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands:

data <- runif(58)

a <- density(data, from=0, to=1)
plot(a, type="l",lwd=3)

But then, when I try to approximate the area under the plotted curve with the command:

area <- sum(a$y)*(a$x[1]-a$y[2])

I get an area that is clearly smaller than 1.

Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly.

Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution.

Thank you!

Gon?alo
#
Is your "data" supposed to be observations, or values of the density
of the underlying law?
Also, could you explain the rationale behind :
sum(a$y)*(a$x[1]-a$y[2])
because it is not immediately clear to the reader.


2011/9/8 Gon?alo Ferraz <gferraz29 at gmail.com>:
#
For bounded density estimation look at the logspline package instead of the regular density function.
#
Look at

    area <- sum(a$y)*(a$x[1]-a$y[2])

The problem appears to be "a$x[1]-a$y[2]"; that is not the length of
the base of an approximating rectangle, whatever it is :-)

albyn
On Thu, Sep 08, 2011 at 11:36:23AM -0400, Gon?alo Ferraz wrote:

  
    
#
On Sep 8, 2011, at 19:03 , Albyn Jones wrote:

            
I would assume that that is just a typo for a$x[1]-a$x[2], which makes sense if the x grid is regular. The real issue is that from= and to= only restrict the range over which the estimated density is _calculated_ not the one over which it is defined.

I.e. there are bits of the density "sticking out" of the interval from 0 to 1. So the integral is naturally less than 1. To see the point, try (with "a" from the original question).

plot(a)
lines(density(data, from=.4, to=.6), lwd=5)

Greg Snow had the advice how really to estimate a density on a restricted region.