Hi all,
I'm trying to look at correlation between two raster layers, for
different polygons. So I use raster::extract to get all the raster
values for every polygon, do the calculation and feed the output back to
a SpatialPolygonDataFrame.
I got it working, but I have a doubt regarding the order of the rows;
and it doesn't look like I can use match.ID = TRUE.
See the example below.
library(raster)
library(dplyr)
# Create brick with 2 layers
b <- brick(ncol=36, nrow=18, nl=2)
b[[1]] <- rnorm(ncell(b))
b[[2]] <- rnorm(ncell(b))
# Create sp
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
cds3 <- rbind(c(-10,20), c(50,60), c(70,-10))
polys <- spPolygons(cds1, cds2, cds3)
# Extract all values
df0 <- extract(b, polys, df = TRUE)
# Compute correlation betwen the two layers for every polygons (sorry
for the pipe)
df1 <- group_by(df0, ID) %>%
summarise(cor = cor(layer.1, layer.2)) %>%
data.frame()
# Attach to df to spdf
spdf <- SpatialPointsDataFrame(polys, df1)
How do I know for sure that the order of the rows in the dataframe did
not get mixed up? Can I just assume that they will remain in the same order?
The dataframe returned by extract has an ID column, but the IDs do not
correspond to the polygons IDs. For instance if I remove the second
polygon (which has the ID "2"), the IDs in the df extracted are still 1
and 2 (instead of 1 and 3).
# Quick function to get polygons IDs
getPolyID <- function(x) {
sapply(x at polygons, function(x) {x at ID} )
}
getPolyID(polys)
# [1] "1" "2" "3"
getPolyID(polys[-2])
# [1] "1" "3"
unique(extract(b, polys[-2], df = TRUE)$ID)
# [1] 1 2
Any suggestions?
Thanks,
Lo?c
Match polygon and dataframe IDs after raster::extract
4 messages · Loïc Dutrieux, Edzer Pebesma
On 19/11/15 14:37, Lo?c Dutrieux wrote:
Hi all, I'm trying to look at correlation between two raster layers, for different polygons. So I use raster::extract to get all the raster values for every polygon, do the calculation and feed the output back to a SpatialPolygonDataFrame. I got it working, but I have a doubt regarding the order of the rows; and it doesn't look like I can use match.ID = TRUE. See the example below. library(raster) library(dplyr) # Create brick with 2 layers b <- brick(ncol=36, nrow=18, nl=2) b[[1]] <- rnorm(ncell(b)) b[[2]] <- rnorm(ncell(b)) # Create sp cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20)) cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0)) cds3 <- rbind(c(-10,20), c(50,60), c(70,-10)) polys <- spPolygons(cds1, cds2, cds3) # Extract all values df0 <- extract(b, polys, df = TRUE) # Compute correlation betwen the two layers for every polygons (sorry for the pipe) df1 <- group_by(df0, ID) %>% summarise(cor = cor(layer.1, layer.2)) %>% data.frame() # Attach to df to spdf spdf <- SpatialPointsDataFrame(polys, df1) How do I know for sure that the order of the rows in the dataframe did not get mixed up? Can I just assume that they will remain in the same order?
If it does reshuffle and you didn't specify match.ID, it will warn you:
pts = matrix(1:4,2,2,dimnames=list(c("a", "b"), NULL))
pts
[,1] [,2] a 1 3 b 2 4
df = data.frame(a = 2:3, row.names = c("b", "a"))
df
a b 2 a 3
SpatialPointsDataFrame(pts, df)
coordinates a a (1, 3) 3 b (2, 4) 2 Warning message: In SpatialPointsDataFrame(pts, df) : forming a SpatialPointsDataFrame based on maching IDs, not on record order. Use match.ID = FALSE to match on record order
SpatialPointsDataFrame(pts, df, match.ID = FALSE)
coordinates a b (1, 3) 2 a (2, 4) 3 Using match.ID = FALSE ensures the input order of coordinates and points is kept.
The dataframe returned by extract has an ID column, but the IDs do not
correspond to the polygons IDs. For instance if I remove the second
polygon (which has the ID "2"), the IDs in the df extracted are still 1
and 2 (instead of 1 and 3).
# Quick function to get polygons IDs
getPolyID <- function(x) {
sapply(x at polygons, function(x) {x at ID} )
}
getPolyID(polys)
# [1] "1" "2" "3"
getPolyID(polys[-2])
# [1] "1" "3"
unique(extract(b, polys[-2], df = TRUE)$ID)
# [1] 1 2
Any suggestions?
Thanks,
Lo?c
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20151119/8b95be7f/attachment.bin>
Thanks Edzer, But then, how safe is it to match attributes and geometries based on the record order? Is there any chance that records get shuffled during the dataframe manipulation steps, or the raster::extract? Also one note, SpatialPointsDataFrame() gives a warning, but SpatialPolygonsDataFrame() doesn't. (I made a mistake in my initial example, I want to create a SpatialPolygonsDataFrame, not a SpatialPointsDataFrame). Cheers, Lo?c
On 11/19/2015 03:08 PM, Edzer Pebesma wrote:
On 19/11/15 14:37, Lo?c Dutrieux wrote:
Hi all, I'm trying to look at correlation between two raster layers, for different polygons. So I use raster::extract to get all the raster values for every polygon, do the calculation and feed the output back to a SpatialPolygonDataFrame. I got it working, but I have a doubt regarding the order of the rows; and it doesn't look like I can use match.ID = TRUE. See the example below. library(raster) library(dplyr) # Create brick with 2 layers b <- brick(ncol=36, nrow=18, nl=2) b[[1]] <- rnorm(ncell(b)) b[[2]] <- rnorm(ncell(b)) # Create sp cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20)) cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0)) cds3 <- rbind(c(-10,20), c(50,60), c(70,-10)) polys <- spPolygons(cds1, cds2, cds3) # Extract all values df0 <- extract(b, polys, df = TRUE) # Compute correlation betwen the two layers for every polygons (sorry for the pipe) df1 <- group_by(df0, ID) %>% summarise(cor = cor(layer.1, layer.2)) %>% data.frame() # Attach to df to spdf spdf <- SpatialPointsDataFrame(polys, df1) How do I know for sure that the order of the rows in the dataframe did not get mixed up? Can I just assume that they will remain in the same order?
If it does reshuffle and you didn't specify match.ID, it will warn you:
pts = matrix(1:4,2,2,dimnames=list(c("a", "b"), NULL))
pts
[,1] [,2] a 1 3 b 2 4
df = data.frame(a = 2:3, row.names = c("b", "a"))
df
a b 2 a 3
SpatialPointsDataFrame(pts, df)
coordinates a a (1, 3) 3 b (2, 4) 2 Warning message: In SpatialPointsDataFrame(pts, df) : forming a SpatialPointsDataFrame based on maching IDs, not on record order. Use match.ID = FALSE to match on record order
SpatialPointsDataFrame(pts, df, match.ID = FALSE)
coordinates a b (1, 3) 2 a (2, 4) 3 Using match.ID = FALSE ensures the input order of coordinates and points is kept.
The dataframe returned by extract has an ID column, but the IDs do not
correspond to the polygons IDs. For instance if I remove the second
polygon (which has the ID "2"), the IDs in the df extracted are still 1
and 2 (instead of 1 and 3).
# Quick function to get polygons IDs
getPolyID <- function(x) {
sapply(x at polygons, function(x) {x at ID} )
}
getPolyID(polys)
# [1] "1" "2" "3"
getPolyID(polys[-2])
# [1] "1" "3"
unique(extract(b, polys[-2], df = TRUE)$ID)
# [1] 1 2
Any suggestions?
Thanks,
Lo?c
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
On 19/11/15 16:36, Lo?c Dutrieux wrote:
Thanks Edzer, But then, how safe is it to match attributes and geometries based on the record order? Is there any chance that records get shuffled during the dataframe manipulation steps, or the raster::extract?
Can you think of a reason why the author of the code would do that?
To be sure I would read the docs and then try to understand
library(raster)
getMethod("extract", c("Raster", "SpatialPolygons"))
Also one note, SpatialPointsDataFrame() gives a warning, but SpatialPolygonsDataFrame() doesn't. (I made a mistake in my initial example, I want to create a SpatialPolygonsDataFrame, not a SpatialPointsDataFrame).
as documented, SpatialPolygonsDataFrame has argument match.ID default to TRUE.
Cheers, Lo?c On 11/19/2015 03:08 PM, Edzer Pebesma wrote:
On 19/11/15 14:37, Lo?c Dutrieux wrote:
Hi all, I'm trying to look at correlation between two raster layers, for different polygons. So I use raster::extract to get all the raster values for every polygon, do the calculation and feed the output back to a SpatialPolygonDataFrame. I got it working, but I have a doubt regarding the order of the rows; and it doesn't look like I can use match.ID = TRUE. See the example below. library(raster) library(dplyr) # Create brick with 2 layers b <- brick(ncol=36, nrow=18, nl=2) b[[1]] <- rnorm(ncell(b)) b[[2]] <- rnorm(ncell(b)) # Create sp cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20)) cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0)) cds3 <- rbind(c(-10,20), c(50,60), c(70,-10)) polys <- spPolygons(cds1, cds2, cds3) # Extract all values df0 <- extract(b, polys, df = TRUE) # Compute correlation betwen the two layers for every polygons (sorry for the pipe) df1 <- group_by(df0, ID) %>% summarise(cor = cor(layer.1, layer.2)) %>% data.frame() # Attach to df to spdf spdf <- SpatialPointsDataFrame(polys, df1) How do I know for sure that the order of the rows in the dataframe did not get mixed up? Can I just assume that they will remain in the same order?
If it does reshuffle and you didn't specify match.ID, it will warn you:
pts = matrix(1:4,2,2,dimnames=list(c("a", "b"), NULL))
pts
[,1] [,2] a 1 3 b 2 4
df = data.frame(a = 2:3, row.names = c("b", "a"))
df
a b 2 a 3
SpatialPointsDataFrame(pts, df)
coordinates a a (1, 3) 3 b (2, 4) 2 Warning message: In SpatialPointsDataFrame(pts, df) : forming a SpatialPointsDataFrame based on maching IDs, not on record order. Use match.ID = FALSE to match on record order
SpatialPointsDataFrame(pts, df, match.ID = FALSE)
coordinates a b (1, 3) 2 a (2, 4) 3 Using match.ID = FALSE ensures the input order of coordinates and points is kept.
The dataframe returned by extract has an ID column, but the IDs do not
correspond to the polygons IDs. For instance if I remove the second
polygon (which has the ID "2"), the IDs in the df extracted are still 1
and 2 (instead of 1 and 3).
# Quick function to get polygons IDs
getPolyID <- function(x) {
sapply(x at polygons, function(x) {x at ID} )
}
getPolyID(polys)
# [1] "1" "2" "3"
getPolyID(polys[-2])
# [1] "1" "3"
unique(extract(b, polys[-2], df = TRUE)$ID)
# [1] 1 2
Any suggestions?
Thanks,
Lo?c
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster, Heisenbergstra?e 2, 48149 M?nster, Germany; +49 251 83 33081 Journal of Statistical Software: http://www.jstatsoft.org/ Computers & Geosciences: http://elsevier.com/locate/cageo/ Spatial Statistics Society http://www.spatialstatistics.info -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20151119/b0f31fe4/attachment.bin>