Skip to content

density plot of simulated exponential distributed data

5 messages · Juanjuan Chai, Dennis Murphy, Greg Snow +1 more

#
Hi all,

I tried to plot the density curve using the data from simulation. I am  
sure that the data should be exponentially distributed, but the plot  
of density curve always starts from (0,0) which is not the case for  
exponential distribution. Is there any way around this, to keep the  
curve dropping at 0?

Thanks.

The following are the codes I tested:

data <- vector()
rate <- 3
i <- 1
for(i in 1:1000){
	r <- runif(1)
	data[i] <- log(1-r)/(-rate)
	i <- i+1
}
plot(density(data))

-JJ
#
Hi:

Try this (and note the use of vectorization rather than a loop):

rate <- 3
dta <- -log(1 - runif(1000))/rate
hist(dta, nclass = 30, probability = TRUE)
x <- c(0.001, seq(0, 3, by = 0.01))
lines(x, dexp(x, rate = 3))

This is the difference in timings between the vectorized and iterative
methods of generating the samples:
user  system elapsed
   0.10    0.00    0.09
+ i <- 1
+ for(i in 1:1000){
+         r <- runif(1)
+         dt[i] <- log(1-r)/(-rate)
+         i <- i+1
+  } }))
   user  system elapsed
   9.35    0.00    9.40

Vectorization is usually your friend in R, and it pays to use it when
available. All of the d*, p*, q* and r* functions, where * denotes the
suffix for a distribution, are vectorized, as are most of the
functions in base R. A happy by-product is that it also makes for more
easily readable code.

HTH,
Dennis
On Tue, Apr 26, 2011 at 3:19 PM, Juanjuan Chai <chaij at umail.iu.edu> wrote:
#
You might want to use the logspline package instead of the density function, it allows you to specify bounds on a distribution.
#
There is an extensive statistical literature on how to correct for boundary bias in kernel density estimates.  See, for example, 

An Improved Estimator of the Density Function at the Boundary
S. Zhang, R. J. Karunamuni and M. C. Jones
Journal of the American Statistical Association
Vol. 94, No. 448 (Dec., 1999), pp. 1231-1241

And the references therein.  However, a very simple -- certainly not optimal -- fix is to reflect the data about the origin before fitting the density:

y <- rexp(100)
yy <- c(y, -y)
dens <- density(yy)
dens$y <- 2*dens$y[dens$x >=0]
dens$x <- dens$x[dens$x >= 0]
plot(dens,col="red")
lines(density(y),col="gray")

_______________________
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Greg Snow
Sent: Wednesday, April 27, 2011 1:55 PM
To: Juanjuan Chai; r-help at r-project.org
Subject: Re: [R] density plot of simulated exponential distributed data

You might want to use the logspline package instead of the density function, it allows you to specify bounds on a distribution.
#
I tried logspline function using a lower bound 0 for my data, it works 
like a charm. When the I changed the xlim only positive part, the 
vertical line was also gone. That's exactly what I expected.

Thanks.
-JJ
Greg Snow wrote: