model fitting of randomly generated data in spatstat
On 02/04/15 03:09, Robert Leaf wrote:
I was generating some data for analysis and was curious to see if we could fit a ?MatClust? model using the function *spatstat*::kppm to some of our observed data. As a first cut, and to see if we get values that conform to our expectations, I fit models to simulated data and was curious about the results. I am hoping that the group can help me understand the departures from expecations. Is it reasonable that the kppm function should return parameters values that are similar to the those that generated the data?
Sure, given that the those used to generate the data are not too bizarre.
We are not getting value that are anywhere close to what we would expect.
That appears to be because you are using *bizarre* parameter values to generate your data. The algorithms used by kppm() can be expected to return far-out results unless the data to which kppm() is applied have at least *some* reasonable prospect of conforming to the model that is being fitted.
library(*spatstat*)
What are those asterisks doing in that call??? That cannot have been the call that you actually used.
(point.vals <- rMatClust(kappa = 2, r = 2, mu = 2000)) # generate random
points
if (point.vals$n > 0) { # some realizations of the model return .ppp
variables of with no data
I was initially bewildered by this --- the expected number of points is
4000, so how could you possibly get zero points? I asked. Finally I saw
the light; with kappa = 2 you will zero parent points, and hence an
empty pattern about 13.5% of the time. I.e. kappa = 2 is just plain
silly-small.
Using "r = 2" (these days the syntax is ***scale = 2*** means that you
are forming clusters in discs of radius 2 .... in the unit square!!!
(You are using the default window.) This makes no sense to me.
Setting mu = 2000 means you are generating an average of 2000 points in
each such disk. I really don't think this is a realistic value for a
Mat?rn cluster process.
Your simulated pattern (if it is not empty) will have the appearance of
having arisen from a very high intensity Poisson process. Fitting a
Mat?rn cluster process to such a pattern results in ill-determined
parameter values.
Try:
set.seed(42)
X <- rMatClust(kappa=20,scale=0.04,mu=5)
fit <- kppm(X ~ 1,"MatClust")
fit
....
Fitted cluster parameters:
kappa scale
22.37058543 0.04168089
Mean cluster size: 4.514857 points
The estimated parameters are reasonably commensurate with those used
to generate the pattern.
<SNIP>
cheers,
Rolf Turner
P.S. If your chosen parameter values (kappa = 2, mu = 2000) were
selected in imitation of parameter estimates obtained from fitting a
Mat?rn cluster model to real data, then I would suggest that you should
probably *not* fit such a model to those data.
In modelling it is important to try fitting *appropriate* models to data
sets. Otherwise the results you get may well be meaningless.
R. T.
Rolf Turner Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 Home phone: +64-9-480-4619