Help with generating data from a 'not quite' Normal distriburtion - R-help

Thu, Aug 12, 2004 1:59 AM #

Hi,

Also the Cauchy's distribution could be good:

rcauchy(n, location = 0, scale = 1)


Best
Vito


I would be very grateful for any help from members of
this list for what
might be a simple problem...

We are trying to simulate the behaviour of a clinical
measurement in a
series of computer experiments. This is simple enough
to do in R if we
assume the measurements to be Gaussian, but their
empirical distribution
has a much higher peak at the mean and the
distribution has much longer
tails. (The distribution is quite symmetrical) Can
anyone suggest any
distributions I could fit to this data, and better
still how I can then
generate random data from this 'distribution' using R?

-----------------------------------------------
Dr. David Crabb
School of Science,
The Nottingham Trent University,
Clifton Campus, Nottingham. NG11 8NS
Tel: 0115 848 3275   Fax: 0115 848 6690

=====
Diventare costruttori di soluzioni

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml

Martin Maechler

Thu, Aug 12, 2004 4:41 AM #

Vito> Hi, Also the Cauchy's distribution could be good:

    Vito> rcauchy(n, location = 0, scale = 1)

"also" is an exaggeration, after you already told him to use the
t-distribution family:

Cauchy = t-Dist(*, df = 1) !


    DCrabb> I would be very grateful for any help from members of
    DCrabb> this list for what might be a simple problem...

    DCrabb> We are trying to simulate the behaviour of a clinical
    DCrabb> measurement in a series of computer experiments. This
    DCrabb> is simple enough to do in R if we assume the
    DCrabb> measurements to be Gaussian, but their empirical
    DCrabb> distribution has a much higher peak at the mean and
    DCrabb> the distribution has much longer tails. (The
    DCrabb> distribution is quite symmetrical) Can anyone suggest
    DCrabb> any distributions I could fit to this data, and better
    DCrabb> still how I can then generate random data from this
    DCrabb> 'distribution' using R?

I'd first try with the t distribution, using  fitdistr() from
package MASS, e.g.,

  > x <- rt(1000, df = 1.5)
  > library(MASS)
  > fx <- fitdistr(x, densfun = "t")
  > fx
	  m             s            df     
    -0.01396785    1.04338151    1.57749052 
   ( 0.04426267) ( 0.04766543) ( 0.10809543)
  > 

(so it *does* estimate location and scale in addition to the df).

If you read the help page
  > ?fitdistr

you'll see in the example that estimating 'df' is said to be
problematic.
AFAIK it can be better to reparametrize, possibly using 1/df or
log(df) as new parameter.
{but then you can't use fitdistr() but rather mle() and the
 log likelihood or optim() directly}.

Martin Maechler

Brian Ripley

Thu, Aug 12, 2004 5:08 AM #

On Thu, 12 Aug 2004, Martin Maechler wrote:

It is the use of ML for the df that is *in theory* problematic, not the
optimization per se.  See the reference, p.110, for some of the 
literature.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595