Skip to content

How to find maximum values on the density function of a random variable

7 messages · Michael Lawrence, Bert Gunter, guox at ucalgary.ca +1 more

#
I would like to find the maximum values on the density function of a
random variable. For example, I have a random variable

rv <- rbinom(10000,1,0.1) + rnorm(10000)

Its density function is given by density(rv) and can be displayed by
plot(density(rv)). How to calculate its maximum values?
A density function may have a few (global and local) maximum values.
Please help. Thanks,
-james
#
rv <- rbinom(10000,1,0.1) + rnorm(10000)

d.rv = density(rv)
d.x = d.rv$x
d.y = d.rv$y

d.rv.max = d.rv$x[which.max(d.rv$y)]

plot(d.rv)
abline(v=d.rv.max)

#that what you want?
On Thu, Mar 12, 2009 at 6:28 PM, <guox at ucalgary.ca> wrote:

  
    
#
There is some considerable confusion in both the question and the reply.

rv is **not**  a random variable. It is an (iid) sample from (i.e. a
"realization" of) a random variable. It has *no* "density function" and the
density() function is simply a procedure to **estimate** the density of the
underlying random variable from which rv was sampled at a finite number of
points. The result of density()and the max given in the reply will depend on
the particular parameters given to density()(see ?density for details), as
well as the data. In other words, both the question and answer posted are
nonsense.

Now let me contradict what I just said. **If** you consider rv a finite,
discrete distribution (i.e. the whole population), then, in fact, it does
have a discrete density, with point mass j(i)/n at each unique sample value
i, where n is the total sample size (= 10000 in the example) and j(i) is the
number of samples values == i, which would probably be 1 for all i. Then, of
course, one can talk about the density of this finite distribution in the
obvious way and its maximum or maxima, occur at those i for which n(i) is
largest.

But of course that's not what the poster really meant, so that brings us
back to the nonsense question and answer. What James probably meant to ask
was: "How can the maximum of the underlying population density function be
estimated?" Well, that's a complicated issue. One could, of course, use some
sort of density estimate -- there are tons -- and find its max; that was the
approach taken in the answer, but it's not so simple as it appears because
of the need to choose the **appropriate** estimate (including the parameters
of the statistical algorithm doing the estimating ). This is the sort of
thing that actually requires some careful thought and statistical expertise.
You will find, I believe, that the prescription for finding the max
suggested below can give quite different answers depending on the parameters
chosen for this estimate, and on the estimate used. So if you need to do
this right, may I suggest consulting the literature on density estimation or
perhaps talking with your local statistician?

-- Bert Gunter
Genentech Nonclinical Statistics 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Mike Lawrence
Sent: Thursday, March 12, 2009 5:40 PM
To: guox at ucalgary.ca
Cc: r-help at r-project.org
Subject: Re: [R] How to find maximum values on the density function of
arandom variable

rv <- rbinom(10000,1,0.1) + rnorm(10000)

d.rv = density(rv)
d.x = d.rv$x
d.y = d.rv$y

d.rv.max = d.rv$x[which.max(d.rv$y)]

plot(d.rv)
abline(v=d.rv.max)

#that what you want?
On Thu, Mar 12, 2009 at 6:28 PM, <guox at ucalgary.ca> wrote:
http://www.R-project.org/posting-guide.html

  
    
#
Yes, a random variable, discrete or continuous one, should associate with
a probability space and a measurable space.
I thought that graph of density(rv) below could give us an example of a
density function. I am very sorry for confusing you.

My question is how to find/estimate maximum values of a given density
function (even any given function within a given domain).
The number of these maximum values might be > 1 but the global one is unique.
Any ideas and references?
Thanks,
-james
#
If you are trying to build your own function then presumably you do  
not want the global maximum, since that is trivially returned by max.  
So what do you really want? Is this a programming question or just a  
general statistics question?

If you want to search along a (specific) sequence for local maxima,  
then you could try programming with "second differences" looking for  
shifts in sign from positive to negative.

vec <- c(1:5, 6, 5:1)

 > diff(diff(vec))
[1]  0  0  0  0 -2  0  0  0  0

It is a bit ambiguous what you would do with this vector:

 > vec2 <- c(1:5,rep(6,3),5:1)
 > diff(diff(vec2))
  [1]  0  0  0  0 -1  0 -1  0  0  0  0
#
Thanks for your ideas.
I would like to find all possible maximums - "mountains" on a graph of a
given density function but I have no ideas. In calculus, there is a
general approach for a given function f(x): Find derivative of f(x) and
estimate all zeros of f'(x). These zeros give us locations of mountains if
f(x) is differentiable. If f(x) is not differentiable (f(x) is not
continuous in this case), we cannot use this approach. Bur in discrete
case, we can still talk about local maximums, I guess. Your ideas are very
helpful.
-james
#
In discrete math, first differences are the analogues of first  
derivatives and second differences are the analogues of second  
derivatives. (I thought I learned this in Knuth, vol 1 25 years ago,  
but I cannot find it, so maybe it was in some the time series stuff I  
read 20 years ago). So a negative second difference may signal a local  
maximum ( and a positive second difference would signal a local  
minimum). You may need to also check for plateaus that are not really  
local maxima. They would result in a signal like (-1 0 0 1)
 > vec3 <- c(1:5,5,5:10)
 > diff(diff(vec3))
  [1]  0  0  0 -1  0   1  0  0  0  0  # a discrete inflection

I think the -2's would not need to be checked further, but the -1 's  
would need to be further verified. Since the differenced sequences get  
shortened with each application you also need to adjust before for  
using the results as indexes.
 > vec
  [1] 1 2 3 4 5 6 5 4 3 2 1
 >which(diff(diff(vec))==-2)
[1] 5
 > vec[which(diff(diff(vec))==-2)]
[1] 5        # "off" by one
 > vec[which(diff(diff(vec))==-2)+1]
[1] 6        # correct maximum

There must be worked out algorithms for peak finding in the R package  
universe.