Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: data <- runif(58) a <- density(data, from=0, to=1) plot(a, type="l",lwd=3) But then, when I try to approximate the area under the plotted curve with the command: area <- sum(a$y)*(a$x[1]-a$y[2]) I get an area that is clearly smaller than 1. Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. Thank you! Gon?alo
Density function: Area under density plot is not equal to 1. Why?
5 messages · Gonçalo Ferraz, Jean-Christophe BOUËTTÉ, Greg Snow +2 more
Is your "data" supposed to be observations, or values of the density of the underlying law? Also, could you explain the rationale behind : sum(a$y)*(a$x[1]-a$y[2]) because it is not immediately clear to the reader. 2011/9/8 Gon?alo Ferraz <gferraz29 at gmail.com>:
Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: data <- runif(58) a <- density(data, from=0, to=1) plot(a, type="l",lwd=3) But then, when I try to approximate the area under the plotted curve with the command: area <- sum(a$y)*(a$x[1]-a$y[2]) I get an area that is clearly smaller than 1. Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. Thank you! Gon?alo
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
For bounded density estimation look at the logspline package instead of the regular density function.
Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Gon?alo Ferraz > Sent: Thursday, September 08, 2011 9:36 AM > To: r-help at r-project.org > Subject: [R] Density function: Area under density plot is not equal to > 1. Why? > > Hi, I have a vector 'data' of 58 probability values (bounded between 0 > and 1) and want to draw a probability density function of these values. > For this, I used the commands: > > data <- runif(58) > > a <- density(data, from=0, to=1) > plot(a, type="l",lwd=3) > > But then, when I try to approximate the area under the plotted curve > with the command: > > area <- sum(a$y)*(a$x[1]-a$y[2]) > > I get an area that is clearly smaller than 1. > > Strangely, if I don't bound the density function with 'to=0,from=1' > (which is against my purpose because it extends the pdf beyond the > limits of a probability value), I get an area of 1.000. This suggests > that I am computing the area well, but using the density function > improperly. > > Why is this happening? Does anyone know how to constrain the density > function while still getting a true pdf (summing to 1 under the curve) > at the end? Should I use a different function? I read through the > density function notes but could not figure out a solution. > > Thank you! > > Gon?alo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Look at
area <- sum(a$y)*(a$x[1]-a$y[2])
The problem appears to be "a$x[1]-a$y[2]"; that is not the length of
the base of an approximating rectangle, whatever it is :-)
albyn
On Thu, Sep 08, 2011 at 11:36:23AM -0400, Gon?alo Ferraz wrote:
Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: data <- runif(58) a <- density(data, from=0, to=1) plot(a, type="l",lwd=3) But then, when I try to approximate the area under the plotted curve with the command: area <- sum(a$y)*(a$x[1]-a$y[2]) I get an area that is clearly smaller than 1. Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. Thank you! Gon?alo
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Albyn Jones Reed College jones at reed.edu
On Sep 8, 2011, at 19:03 , Albyn Jones wrote:
Look at area <- sum(a$y)*(a$x[1]-a$y[2]) The problem appears to be "a$x[1]-a$y[2]"; that is not the length of the base of an approximating rectangle, whatever it is :-)
I would assume that that is just a typo for a$x[1]-a$x[2], which makes sense if the x grid is regular. The real issue is that from= and to= only restrict the range over which the estimated density is _calculated_ not the one over which it is defined. I.e. there are bits of the density "sticking out" of the interval from 0 to 1. So the integral is naturally less than 1. To see the point, try (with "a" from the original question). plot(a) lines(density(data, from=.4, to=.6), lwd=5) Greg Snow had the advice how really to estimate a density on a restricted region.
albyn On Thu, Sep 08, 2011 at 11:36:23AM -0400, Gon?alo Ferraz wrote:
Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: data <- runif(58) a <- density(data, from=0, to=1) plot(a, type="l",lwd=3) But then, when I try to approximate the area under the plotted curve with the command: area <- sum(a$y)*(a$x[1]-a$y[2]) I get an area that is clearly smaller than 1. Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. Thank you! Gon?alo
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Albyn Jones Reed College jones at reed.edu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg