Skip to content

fit simple surface to 2d data?

4 messages · george young, Brian Ripley, Roger Bivand

#
I have an array of floating-point measurements on a square (5 by 5) 2d grid.
The data are nominally constant, and somewhat noisy.
I need to find any significant spatial trend, e.g. bigger on the
left, bigger in the middle, etc.  I have many thousands of these data sets
that need to be scanned for 'interesting' spatial variations, selecting the
datasets that are beyond some criterion of flatness.
 
My thought was to fit a 2'nd order polynomial with least-squares or some
such metric, and scan for coefficients bigger than some cutoff.  I think
a parabolic surface is probably as complex a surface as the small amount of data merits.
 
Is there functionality in R that would be appropriate?
 
Is there some other approach anyone would suggest for the general task?
I'm not very experienced in data crunching, so any suggestion would
be appreciated.
 
I don't mind committing a lot of cpu to the task, if that helps.    

Thanks,
	George Young
	MIT Lincoln Laboratory
	Lexington, Mass, USA
#
On Fri, 6 Jul 2001, george young wrote:

            
Trend surfaces in package spatial do that, and I would rather do an anova,
which Roger Bivand has kindly contributed.
That's more or less what I would do, the anova bit being the difference.
#
On Fri, 6 Jul 2001, Prof Brian Ripley wrote:

            
Yes, this feels like trend surface, but I'm not sure that it isn't a
classification problem? Given that there are thousands of replications of
the 25 grid values, maybe clara() in the cluster package or one of the
many other classifiers could pull out a much smaller number of classes for
which the surfaces could be calculated?

Clara wouldn't be using the distance information at all, unfortunately.
Another cut might be to compute a localised Moran's I_i or the Getis-Ord
G_i, yielding local measures of spatial autocorrelation for each of the
grid points and cluster those? This would be especially relevant if the
process generating the z values at the grid locations is known to exhibit
positive spatial dependence (values close to each other on the grid
are more alike than spatially distant values). If there is no spatial
dependence, trend surface won't help much either!

anova() on the trend surfaces could do this testing against "some
criterion of flatness", like the 0 order surface,
Analysis of Variance Table

Model 1: surf.ls(np = 0, x = x.g, y = y.g, z = z)
Model 2: surf.ls(np = 2, x = x.g, y = y.g, z = z)
  Res.Df Res.Sum Sq Df  Sum Sq F value  Pr(>F)
1     24    2.07349                           
2     19    1.28882  5 0.78467  2.3136 0.08421

but maybe if a classifier was trained to distinguish grids using "some
criterion of flatness", incoming data could be sorted into flat/not flat
for further exploration. One of the issues I would watch with trend
surface is the influence of outlying z values, something a classification
approach might not be affected by to the same extent.

Roger
#
Well, I would say that the definition of 'interesting' makes this into
a supervised pattern recognition problem, and that using clustering
techniques for such problems is a classic error.  What one needs is some
way to put in the prior information, and that needs model-based clustering
techniques, at least.

A close analogy: people studying automated screening of mammograms are
trying to pink up signs of (pre)-cancer, not the many benign variations in
breast tissue.  Yet clustering techniques have been proposed frequently
(and those I have studied are not at all successful).
On Sat, 7 Jul 2001, Roger Bivand wrote:

            
One could do robust fitting (and we do on brain images).  *But* outliers
will correspond to non-flatness here.

Brian