"Re-creating" distributions - R-help

Fri, Jun 8, 2012 7:30 AM #

I wouldn't go quite so far as to say there's absolutely nothing else
-- one could, e.g., also fit lognormal, gamma, beta or most any other
two parameters distributions from the supplied data [assuming the
support matches].

What I did say is that you need domain specific knowledge to pick a
distribution to which to fit: then, if the moments are known in closed
form from the parameters, moment matching comes down to simultaneous
non-linear equations. I'm not aware of a unified infrastructure for
this in R [so I'm cc'ing the list in case someone else is], but it's
not a terribly difficult problem for the low dimensions we're talking
about.

E.g.,

If you know your data has a gamma distribution with mean 10 and
variance 20, you look at the Wikipedia gamma distribution page to find

Mean = k * theta
Variance = k * theta * theta

So Variance / Mean = theta --> Theta = 2 for your problem. Then k = 5.
Similarly, the all-great Wikipedians provide closed form solutions to
get the lognormal parameters back from observed sample moments:
http://en.wikipedia.org/wiki/Lognormal_distribution#Arithmetic_moments

As Bert rightly cautions, this is far outside the realm of good
practice and your energies would be better served if you could get a
better picture of the underlying data.

Best,
Michael

On Fri, Jun 8, 2012 at 9:13 AM, Bert Gunter <gunter.berton at gene.com> wrote:

Andras:
I realize my comment was rather cryptic, but which part of Michael's "You can't" did you not understand? Other then

?dnorm

which, as Michael said, is probably not a good thing, you can do nothing. You need to refocus your efforts on changing the system to get useful data, not trying to make a silk purse out of a sow's ear. Or, as John Tukey said many years ago:

"The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. "
-- John Tukey

-- Bert




On Fri, Jun 8, 2012 at 5:14 AM, Andras Farkas <motyocska at yahoo.com> wrote:


Dear Bert and Michael

thank you for your note below. Based on Michael's input and the lack of covariance matrix availble to me (for the most part), moment matching sounds like the best option. I have searched the internet for discussions on this using R but did not find much useful information.?I also have to apologize, but I am somewhat new to the software and this level of statistics.I am usually pretty good at figuring things out, but this one is probably way over my head. I was wondering if you could point me into the right direction using R to "re-build" the distribution that has the following parameters:

mean: 0.007, median: 0.003, SD:0.011.

I greatly apreciate your help,

Sincerely,

Andras

gunter.berton at gene.com> wrote:


From: Bert Gunter <gunter.berton at gene.com>
Subject: Re: [R] "Re-creating" distributions
To: "R. Michael Weylandt" <michael.weylandt at gmail.com>
Cc: "Andras Farkas" <motyocska at yahoo.com>, r-help at r-project.org
Date: Friday, June 8, 2012, 12:29 AM

Related comment:

"Even the data aren't sufficient." -- Brian Joiner (some years ago).

Explanation: See W.E. Deming on "analytic" vs "enumerative" statistics.

--- Bert

On Thu, Jun 7, 2012 at 8:06 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:

Short answer: no, those are (in general) insufficient parameters to
characterize a distribution.

Long answer: unfortunately, it's not uncommon that those "summary
statistics" are the only ones reported based on someone or other's
limited experience with the Gaussian. There are a few things you could
try, but each of them has problems:

i) Pretend like your data is in fact normal and use those parameters
because they do uniquely characterize a normal distribution. MASS
(among others) provides a multivariate normal distribution [mvrnorm]
if you have a covariance matrix available.

ii) If you have reason to imagine another distribution [guided by
domain knowledge], try to get its parameters in so far as possible by
moment matching. Covariance structures are much harder for the general
case though.

iii) If you can get something that resembles original data, simply
work by bootstrapping / imputation.

Hope this helps,
Michael

On Thu, Jun 7, 2012 at 3:34 PM, Andras Farkas <motyocska at yahoo.com> wrote:

Dear All,

I often have to work with?certain models in which I try to "reproduce" a distribution the best I can with very little known information avaible. Is there a package or function in R that could best reproduce a probability distribution using only the mean, median and SD values availble without knowing the actual distribution type?to begin with and/or the covariance matrix (for more then 1 data set)? All I usually have reported availble is mean, median and SD. I hope I made?my question?clear enough...

thanks,

Andras