fit simple surface to 2d data?
Well, I would say that the definition of 'interesting' makes this into a supervised pattern recognition problem, and that using clustering techniques for such problems is a classic error. What one needs is some way to put in the prior information, and that needs model-based clustering techniques, at least. A close analogy: people studying automated screening of mammograms are trying to pink up signs of (pre)-cancer, not the many benign variations in breast tissue. Yet clustering techniques have been proposed frequently (and those I have studied are not at all successful).
On Sat, 7 Jul 2001, Roger Bivand wrote:
On Fri, 6 Jul 2001, Prof Brian Ripley wrote:
On Fri, 6 Jul 2001, george young wrote:
I have an array of floating-point measurements on a square (5 by 5) 2d grid. The data are nominally constant, and somewhat noisy. I need to find any significant spatial trend, e.g. bigger on the left, bigger in the middle, etc. I have many thousands of these data sets that need to be scanned for 'interesting' spatial variations, selecting the datasets that are beyond some criterion of flatness. My thought was to fit a 2'nd order polynomial with least-squares or some such metric, and scan for coefficients bigger than some cutoff. I think a parabolic surface is probably as complex a surface as the small amount of data merits. Is there functionality in R that would be appropriate?
Trend surfaces in package spatial do that, and I would rather do an anova, which Roger Bivand has kindly contributed.
Is there some other approach anyone would suggest for the general task? I'm not very experienced in data crunching, so any suggestion would be appreciated.
That's more or less what I would do, the anova bit being the difference.
Yes, this feels like trend surface, but I'm not sure that it isn't a classification problem? Given that there are thousands of replications of the 25 grid values, maybe clara() in the cluster package or one of the many other classifiers could pull out a much smaller number of classes for which the surfaces could be calculated? Clara wouldn't be using the distance information at all, unfortunately. Another cut might be to compute a localised Moran's I_i or the Getis-Ord G_i, yielding local measures of spatial autocorrelation for each of the grid points and cluster those? This would be especially relevant if the process generating the z values at the grid locations is known to exhibit positive spatial dependence (values close to each other on the grid are more alike than spatially distant values). If there is no spatial dependence, trend surface won't help much either! anova() on the trend surfaces could do this testing against "some criterion of flatness", like the 0 order surface,
x <- 1:5 y <- 1:5 z <- runif(25) x.g <- expand.grid(x,y)[,1] y.g <- expand.grid(x,y)[,2] anova(surf.ls(0, x.g, y.g, z), surf.ls(2, x.g, y.g, z))
Analysis of Variance Table Model 1: surf.ls(np = 0, x = x.g, y = y.g, z = z) Model 2: surf.ls(np = 2, x = x.g, y = y.g, z = z) Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F) 1 24 2.07349 2 19 1.28882 5 0.78467 2.3136 0.08421 but maybe if a classifier was trained to distinguish grids using "some criterion of flatness", incoming data could be sorted into flat/not flat for further exploration. One of the issues I would watch with trend surface is the influence of outlying z values, something a classification approach might not be affected by to the same extent.
One could do robust fitting (and we do on brain images). *But* outliers will correspond to non-flatness here. Brian
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._