Hi, Also the Cauchy's distribution could be good: rcauchy(n, location = 0, scale = 1) Best Vito I would be very grateful for any help from members of this list for what might be a simple problem... We are trying to simulate the behaviour of a clinical measurement in a series of computer experiments. This is simple enough to do in R if we assume the measurements to be Gaussian, but their empirical distribution has a much higher peak at the mean and the distribution has much longer tails. (The distribution is quite symmetrical) Can anyone suggest any distributions I could fit to this data, and better still how I can then generate random data from this 'distribution' using R? ----------------------------------------------- Dr. David Crabb School of Science, The Nottingham Trent University, Clifton Campus, Nottingham. NG11 8NS Tel: 0115 848 3275 Fax: 0115 848 6690 ===== Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml
Help with generating data from a 'not quite' Normal distriburtion
3 messages · Vito Ricci, Martin Maechler, Brian Ripley
"Vito" == Vito Ricci <vito_ricci at yahoo.com>
on Thu, 12 Aug 2004 10:59:23 +0200 (CEST) writes:
Vito> Hi, Also the Cauchy's distribution could be good:
Vito> rcauchy(n, location = 0, scale = 1)
"also" is an exaggeration, after you already told him to use the
t-distribution family:
Cauchy = t-Dist(*, df = 1) !
DCrabb> I would be very grateful for any help from members of
DCrabb> this list for what might be a simple problem...
DCrabb> We are trying to simulate the behaviour of a clinical
DCrabb> measurement in a series of computer experiments. This
DCrabb> is simple enough to do in R if we assume the
DCrabb> measurements to be Gaussian, but their empirical
DCrabb> distribution has a much higher peak at the mean and
DCrabb> the distribution has much longer tails. (The
DCrabb> distribution is quite symmetrical) Can anyone suggest
DCrabb> any distributions I could fit to this data, and better
DCrabb> still how I can then generate random data from this
DCrabb> 'distribution' using R?
I'd first try with the t distribution, using fitdistr() from
package MASS, e.g.,
> x <- rt(1000, df = 1.5)
> library(MASS)
> fx <- fitdistr(x, densfun = "t")
> fx
m s df
-0.01396785 1.04338151 1.57749052
( 0.04426267) ( 0.04766543) ( 0.10809543)
>
(so it *does* estimate location and scale in addition to the df).
If you read the help page
> ?fitdistr
you'll see in the example that estimating 'df' is said to be
problematic.
AFAIK it can be better to reparametrize, possibly using 1/df or
log(df) as new parameter.
{but then you can't use fitdistr() but rather mle() and the
log likelihood or optim() directly}.
Martin Maechler
On Thu, 12 Aug 2004, Martin Maechler wrote:
"Vito" == Vito Ricci <vito_ricci at yahoo.com>
on Thu, 12 Aug 2004 10:59:23 +0200 (CEST) writes:
Vito> Hi, Also the Cauchy's distribution could be good:
Vito> rcauchy(n, location = 0, scale = 1)
"also" is an exaggeration, after you already told him to use the
t-distribution family:
Cauchy = t-Dist(*, df = 1) !
DCrabb> I would be very grateful for any help from members of
DCrabb> this list for what might be a simple problem...
DCrabb> We are trying to simulate the behaviour of a clinical
DCrabb> measurement in a series of computer experiments. This
DCrabb> is simple enough to do in R if we assume the
DCrabb> measurements to be Gaussian, but their empirical
DCrabb> distribution has a much higher peak at the mean and
DCrabb> the distribution has much longer tails. (The
DCrabb> distribution is quite symmetrical) Can anyone suggest
DCrabb> any distributions I could fit to this data, and better
DCrabb> still how I can then generate random data from this
DCrabb> 'distribution' using R?
I'd first try with the t distribution, using fitdistr() from
package MASS, e.g.,
> x <- rt(1000, df = 1.5) > library(MASS) > fx <- fitdistr(x, densfun = "t") > fx
m s df
-0.01396785 1.04338151 1.57749052
( 0.04426267) ( 0.04766543) ( 0.10809543)
>
(so it *does* estimate location and scale in addition to the df). If you read the help page
> ?fitdistr
you'll see in the example that estimating 'df' is said to be
problematic.
AFAIK it can be better to reparametrize, possibly using 1/df or
log(df) as new parameter.
{but then you can't use fitdistr() but rather mle() and the
log likelihood or optim() directly}.
It is the use of ML for the df that is *in theory* problematic, not the optimization per se. See the reference, p.110, for some of the literature.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595