combining values taken at nearly identical locations

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120403/743e6a41/attachment.pl>
?zerodist

from sp. You can set zero > 0 within that.

?dnearneigh

from spdep helps you find neighbours within specified distance. Surely 
rgeos has something ...

HTH,
Tom

Am 03.04.2012 22:40, schrieb Molly Davies:
Hello,

I apologize if my question has already been answered and I failed to find it. There must be a formal word for what I'm trying to do and I just don't know it ...

My data: PM2.5 measurements, taken from 144 different monitors over a period of 78 days. Some monitors have daily values, others report at regular intervals (once every m days, m in {3, 6, 15, etc}, varies by type of monitor).

The wrinkle: A number of these monitors have one or more monitors right next to them (little clusters of identical locations) or very nearby (within 100 meters).

My objective: I would like to combine the values of these clusters of closely spaced monitors. On days when more than one monitor within the cluster reports a PM2.5 measurement, I would like to take the average. On days when only one monitor reports a measurement within the cluster, I'd like to use that one measurement. I do NOT want to average the location of the monitors, though! Rather, I want to use a "majority rules" voting system: if I have a cluster of 4 closely spaced monitors and 2 of them have the same coordinates, I'd like to assign the combined vector of PM2.5 measurements to the coordinates those 2 points have in common. If there are no repeated locations in a cluster, I'd still like to be able to assign the vector of measurements to an existing set of locations in my original data and not an average location. (Note: I am aware that I can use ddply{plyr} to take care of exactly duplicated points, but I want to do more than that.)

A toy example:

Original data: Suppose x and y are location and d1, d2 and d3 are measurements taken on different days.
x   y   d1  d2  d3
1   1   NA  12  3
1   1   14  NA  5
1.3 1.5 8   NA  NA
15  17  11  21  7

I would like to average the rows that are within a radius of 1 from each other and use the coordinates associated with the majority in the combination.
x   y   d1  d2  d3
1   1   11  12  4
15  17  11  21  7

Does what I've described have a name? Are there any built-in functions in R that will do it? If not, I would very much appreciate suggestions about how best to implement such a task.

Some toy data to set me straight with:
Each row is a unique monitor.
My neighborhood radius of interest is 0.1.
###### BEGIN SNIP ############
toyDat<- data.frame(x=runif(90), y=runif(90)) # SETTING UP A DATA FRAME
toyDat[91:100,]<- toyDat[sample(90, 10),] # CREATING SOME EXACTLY DUPLICATED LOCATIONS
toyDat[101:105,]<- toyDat[96:100,] + 0.02 # NOW I'LL HAVE AT LEAST 5 TRIADS INCLUDING ONE DUPLICATED LOCATION.
toyDat$d1<- rnorm(105) # GIVING ALL THE MONITORS DATA
toyDat$d2<- rnorm(105)
toyDat$d1[sample(105, 15)]<- NA # SPRINKLING IN MISSING VALUES TO KEEP IT REALISTIC
toyDat$d2[sample(105, 17)]<- NA
###### END SNIP ############

Thanks so much,

Molly Davies

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Technische Universit?t M?nchen
Department f?r Pflanzenwissenschaften
Lehrstuhl f?r Gr?nlandlehre
Alte Akademie 12
85350 Freising / Germany
Phone: ++49 (0)8161 715324
Fax:   ++49 (0)8161 713243
email: tom.gottfried at wzw.tum.de
http://www.wzw.tum.de/gruenland