fit simple surface to 2d data? - R-help

Fri, Jul 6, 2001 2:27 PM #

I have an array of floating-point measurements on a square (5 by 5) 2d grid.
The data are nominally constant, and somewhat noisy.
I need to find any significant spatial trend, e.g. bigger on the
left, bigger in the middle, etc.  I have many thousands of these data sets
that need to be scanned for 'interesting' spatial variations, selecting the
datasets that are beyond some criterion of flatness.
 
My thought was to fit a 2'nd order polynomial with least-squares or some
such metric, and scan for coefficients bigger than some cutoff.  I think
a parabolic surface is probably as complex a surface as the small amount of data merits.
 
Is there functionality in R that would be appropriate?
 
Is there some other approach anyone would suggest for the general task?
I'm not very experienced in data crunching, so any suggestion would
be appreciated.
 
I don't mind committing a lot of cpu to the task, if that helps.    

Thanks,
	George Young
	MIT Lincoln Laboratory
	Lexington, Mass, USA

I cannot think why the whole bed of the ocean is
 not one solid mass of oysters, so prolific they seem. Ah,
 I am wandering!  Strange how the brain controls the brain!
	-- Sherlock Holmes in "The Dying Detective"
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Fri, Jul 6, 2001 2:49 PM #

On Fri, 6 Jul 2001, george young wrote:

Trend surfaces in package spatial do that, and I would rather do an anova,
which Roger Bivand has kindly contributed.

That's more or less what I would do, the anova bit being the difference.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Roger Bivand

Sat, Jul 7, 2001 4:47 AM #

On Fri, 6 Jul 2001, Prof Brian Ripley wrote:

Yes, this feels like trend surface, but I'm not sure that it isn't a
classification problem? Given that there are thousands of replications of
the 25 grid values, maybe clara() in the cluster package or one of the
many other classifiers could pull out a much smaller number of classes for
which the surfaces could be calculated?

Clara wouldn't be using the distance information at all, unfortunately.
Another cut might be to compute a localised Moran's I_i or the Getis-Ord
G_i, yielding local measures of spatial autocorrelation for each of the
grid points and cluster those? This would be especially relevant if the
process generating the z values at the grid locations is known to exhibit
positive spatial dependence (values close to each other on the grid
are more alike than spatially distant values). If there is no spatial
dependence, trend surface won't help much either!

anova() on the trend surfaces could do this testing against "some
criterion of flatness", like the 0 order surface,

Analysis of Variance Table

Model 1: surf.ls(np = 0, x = x.g, y = y.g, z = z)
Model 2: surf.ls(np = 2, x = x.g, y = y.g, z = z)
  Res.Df Res.Sum Sq Df  Sum Sq F value  Pr(>F)
1     24    2.07349                           
2     19    1.28882  5 0.78467  2.3136 0.08421

but maybe if a classifier was trained to distinguish grids using "some
criterion of flatness", incoming data could be sorted into flat/not flat
for further exploration. One of the issues I would watch with trend
surface is the influence of outlying z values, something a classification
approach might not be affected by to the same extent.

Roger

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Sat, Jul 7, 2001 6:50 AM #

Well, I would say that the definition of 'interesting' makes this into
a supervised pattern recognition problem, and that using clustering
techniques for such problems is a classic error.  What one needs is some
way to put in the prior information, and that needs model-based clustering
techniques, at least.

A close analogy: people studying automated screening of mammograms are
trying to pink up signs of (pre)-cancer, not the many benign variations in
breast tissue.  Yet clustering techniques have been proposed frequently
(and those I have studied are not at all successful).

On Sat, 7 Jul 2001, Roger Bivand wrote:

On Fri, 6 Jul 2001, Prof Brian Ripley wrote:

On Fri, 6 Jul 2001, george young wrote:

I have an array of floating-point measurements on a square (5 by 5) 2d grid.
The data are nominally constant, and somewhat noisy.
I need to find any significant spatial trend, e.g. bigger on the
left, bigger in the middle, etc.  I have many thousands of these data sets
that need to be scanned for 'interesting' spatial variations, selecting the
datasets that are beyond some criterion of flatness.

My thought was to fit a 2'nd order polynomial with least-squares or some
such metric, and scan for coefficients bigger than some cutoff.  I think
a parabolic surface is probably as complex a surface as the small amount of data merits.

Is there functionality in R that would be appropriate?

Trend surfaces in package spatial do that, and I would rather do an anova,
which Roger Bivand has kindly contributed.

Is there some other approach anyone would suggest for the general task?
I'm not very experienced in data crunching, so any suggestion would
be appreciated.

That's more or less what I would do, the anova bit being the difference.

Yes, this feels like trend surface, but I'm not sure that it isn't a
classification problem? Given that there are thousands of replications of
the 25 grid values, maybe clara() in the cluster package or one of the
many other classifiers could pull out a much smaller number of classes for
which the surfaces could be calculated?

Clara wouldn't be using the distance information at all, unfortunately.
Another cut might be to compute a localised Moran's I_i or the Getis-Ord
G_i, yielding local measures of spatial autocorrelation for each of the
grid points and cluster those? This would be especially relevant if the
process generating the z values at the grid locations is known to exhibit
positive spatial dependence (values close to each other on the grid
are more alike than spatially distant values). If there is no spatial
dependence, trend surface won't help much either!

anova() on the trend surfaces could do this testing against "some
criterion of flatness", like the 0 order surface,

x <- 1:5
y <- 1:5
z <- runif(25)
x.g <- expand.grid(x,y)[,1]
y.g <- expand.grid(x,y)[,2]
anova(surf.ls(0, x.g, y.g, z), surf.ls(2, x.g, y.g, z))

Analysis of Variance Table

Model 1: surf.ls(np = 0, x = x.g, y = y.g, z = z)
Model 2: surf.ls(np = 2, x = x.g, y = y.g, z = z)
  Res.Df Res.Sum Sq Df  Sum Sq F value  Pr(>F)
1     24    2.07349
2     19    1.28882  5 0.78467  2.3136 0.08421

but maybe if a classifier was trained to distinguish grids using "some
criterion of flatness", incoming data could be sorted into flat/not flat
for further exploration. One of the issues I would watch with trend
surface is the influence of outlying z values, something a classification
approach might not be affected by to the same extent.

One could do robust fitting (and we do on brain images).  *But* outliers
will correspond to non-flatness here.

Brian

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._