I wouldn't go quite so far as to say there's absolutely nothing else -- one could, e.g., also fit lognormal, gamma, beta or most any other two parameters distributions from the supplied data [assuming the support matches]. What I did say is that you need domain specific knowledge to pick a distribution to which to fit: then, if the moments are known in closed form from the parameters, moment matching comes down to simultaneous non-linear equations. I'm not aware of a unified infrastructure for this in R [so I'm cc'ing the list in case someone else is], but it's not a terribly difficult problem for the low dimensions we're talking about. E.g., If you know your data has a gamma distribution with mean 10 and variance 20, you look at the Wikipedia gamma distribution page to find Mean = k * theta Variance = k * theta * theta So Variance / Mean = theta --> Theta = 2 for your problem. Then k = 5. Similarly, the all-great Wikipedians provide closed form solutions to get the lognormal parameters back from observed sample moments: http://en.wikipedia.org/wiki/Lognormal_distribution#Arithmetic_moments As Bert rightly cautions, this is far outside the realm of good practice and your energies would be better served if you could get a better picture of the underlying data. Best, Michael
On Fri, Jun 8, 2012 at 9:13 AM, Bert Gunter <gunter.berton at gene.com> wrote:
Andras: I realize my comment was rather cryptic, but which part of Michael's "You can't" did you not understand? Other then ?dnorm which, as Michael said, is probably not a good thing, you can do nothing. You need to refocus your efforts on changing the system to get useful data, not trying to make a silk purse out of a sow's ear. Or, as John Tukey said many years ago: "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. " -- John Tukey -- Bert On Fri, Jun 8, 2012 at 5:14 AM, Andras Farkas <motyocska at yahoo.com> wrote:
Dear Bert and Michael thank you for your note below. Based on Michael's input and the lack of covariance matrix availble to me (for the most part), moment matching sounds like the best option. I have searched the internet for discussions on this using R but did not find much useful information.?I also have to apologize, but I am somewhat new to the software and this level of statistics.I am usually pretty good at figuring things out, but this one is probably way over my head. I was wondering if you could point me into the right direction using R to "re-build" the distribution that has the following parameters: mean: 0.007, median: 0.003, SD:0.011. I greatly apreciate your help, Sincerely, Andras gunter.berton at gene.com> wrote: From: Bert Gunter <gunter.berton at gene.com> Subject: Re: [R] "Re-creating" distributions To: "R. Michael Weylandt" <michael.weylandt at gmail.com> Cc: "Andras Farkas" <motyocska at yahoo.com>, r-help at r-project.org Date: Friday, June 8, 2012, 12:29 AM Related comment: "Even the data aren't sufficient." -- Brian Joiner (some years ago). Explanation: See W.E. Deming on "analytic" vs "enumerative" statistics. --- Bert On Thu, Jun 7, 2012 at 8:06 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Short answer: no, those are (in general) insufficient parameters to characterize a distribution. Long answer: unfortunately, it's not uncommon that those "summary statistics" are the only ones reported based on someone or other's limited experience with the Gaussian. There are a few things you could try, but each of them has problems: i) Pretend like your data is in fact normal and use those parameters because they do uniquely characterize a normal distribution. MASS (among others) provides a multivariate normal distribution [mvrnorm] if you have a covariance matrix available. ii) If you have reason to imagine another distribution [guided by domain knowledge], try to get its parameters in so far as possible by moment matching. Covariance structures are much harder for the general case though. iii) If you can get something that resembles original data, simply work by bootstrapping / imputation. Hope this helps, Michael On Thu, Jun 7, 2012 at 3:34 PM, Andras Farkas <motyocska at yahoo.com> wrote:
Dear All, I often have to work with?certain models in which I try to "reproduce" a distribution the best I can with very little known information avaible. Is there a package or function in R that could best reproduce a probability distribution using only the mean, median and SD values availble without knowing the actual distribution type?to begin with and/or the covariance matrix (for more then 1 data set)? All I usually have reported availble is mean, median and SD. I hope I made?my question?clear enough... thanks, Andras