An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120403/743e6a41/attachment.pl>
combining values taken at nearly identical locations
2 messages · Molly Davies, Tom Gottfried
?zerodist from sp. You can set zero > 0 within that. ?dnearneigh from spdep helps you find neighbours within specified distance. Surely rgeos has something ... HTH, Tom Am 03.04.2012 22:40, schrieb Molly Davies:
Hello,
I apologize if my question has already been answered and I failed to find it. There must be a formal word for what I'm trying to do and I just don't know it ...
My data: PM2.5 measurements, taken from 144 different monitors over a period of 78 days. Some monitors have daily values, others report at regular intervals (once every m days, m in {3, 6, 15, etc}, varies by type of monitor).
The wrinkle: A number of these monitors have one or more monitors right next to them (little clusters of identical locations) or very nearby (within 100 meters).
My objective: I would like to combine the values of these clusters of closely spaced monitors. On days when more than one monitor within the cluster reports a PM2.5 measurement, I would like to take the average. On days when only one monitor reports a measurement within the cluster, I'd like to use that one measurement. I do NOT want to average the location of the monitors, though! Rather, I want to use a "majority rules" voting system: if I have a cluster of 4 closely spaced monitors and 2 of them have the same coordinates, I'd like to assign the combined vector of PM2.5 measurements to the coordinates those 2 points have in common. If there are no repeated locations in a cluster, I'd still like to be able to assign the vector of measurements to an existing set of locations in my original data and not an average location. (Note: I am aware that I can use ddply{plyr} to take care of exactly duplicated points, but I want to do more than that.)
A toy example:
Original data: Suppose x and y are location and d1, d2 and d3 are measurements taken on different days.
x y d1 d2 d3
1 1 NA 12 3
1 1 14 NA 5
1.3 1.5 8 NA NA
15 17 11 21 7
I would like to average the rows that are within a radius of 1 from each other and use the coordinates associated with the majority in the combination.
x y d1 d2 d3
1 1 11 12 4
15 17 11 21 7
Does what I've described have a name? Are there any built-in functions in R that will do it? If not, I would very much appreciate suggestions about how best to implement such a task.
Some toy data to set me straight with:
Each row is a unique monitor.
My neighborhood radius of interest is 0.1.
###### BEGIN SNIP ############
toyDat<- data.frame(x=runif(90), y=runif(90)) # SETTING UP A DATA FRAME
toyDat[91:100,]<- toyDat[sample(90, 10),] # CREATING SOME EXACTLY DUPLICATED LOCATIONS
toyDat[101:105,]<- toyDat[96:100,] + 0.02 # NOW I'LL HAVE AT LEAST 5 TRIADS INCLUDING ONE DUPLICATED LOCATION.
toyDat$d1<- rnorm(105) # GIVING ALL THE MONITORS DATA
toyDat$d2<- rnorm(105)
toyDat$d1[sample(105, 15)]<- NA # SPRINKLING IN MISSING VALUES TO KEEP IT REALISTIC
toyDat$d2[sample(105, 17)]<- NA
###### END SNIP ############
Thanks so much,
Molly Davies
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Technische Universit?t M?nchen Department f?r Pflanzenwissenschaften Lehrstuhl f?r Gr?nlandlehre Alte Akademie 12 85350 Freising / Germany Phone: ++49 (0)8161 715324 Fax: ++49 (0)8161 713243 email: tom.gottfried at wzw.tum.de http://www.wzw.tum.de/gruenland