Skip to content

dissolve internal borders of polygons using st_union and group_by

11 messages · Roger Bivand, Marta Rufino, Michael Sumner +2 more

#
Hi,

I am trying to dissolve the internal borders of larger polygons (sf object)
by a grouping variable or by proximity (adjacent)
and after looking in the web I found several alternatives, but none is
doing what I wanted.

## Reproducible example:
# World map
require(rnaturalearth); require(sf)
world_map <- rnaturalearth::ne_countries(scale = 'small', returnclass =
c("sf"))

# World countries:
world_map
object.size(world_map)

# 1st alternative: in this case we have the groups by continent, but the
country boundaries are still there, although without label:
(kk <- world_map %>%
  dplyr::group_by(continent) %>%
  summarise())
object.size(kk)
kk %>%
  st_transform(crs = 42310) %>%
  ggplot()+
  geom_sf()

# 2nd alternative : apparently different, but very similar...
(kk1 <- world_map %>%
  dplyr::group_by(continent) %>%
  summarise(st_union(.)))
object.size(kk1)
kk1 %>%
  st_transform(crs = 42310) %>%
  ggplot()+
  geom_sf()

# 3rd alternative
(kk2 <- world_map %>%
  dplyr::group_by(continent) %>%
  summarise(geom = st_union(geometry)))
object.size(kk2)
kk2 %>%
  st_transform(crs = 42310) %>%
  ggplot()+
  geom_sf()

# 4th alternative
(kk3 <- world_map %>%
  dplyr::group_by(continent) %>%
  summarise(geom = st_combine(geometry)))
object.size(kk3)
kk3 %>%
  st_transform(crs = 42310) %>%
  ggplot()+
  geom_sf()


# In all cases the objects produced are actually very similar at a first
glance, bur in fact they differ on the properties (as reflected by their
sizes).
object.size(world_map)
object.size(kk1)
object.size(kk2)
object.size(kk3)
object.size(kk4)

# And more importantly, in no case the borders between countries were
actually dissolved, which was my objective.


So, the question is how can I get the continents with the country borders
dissolved?
Further, could I dissolve all contiguous borders (instead of having a
grouping variable)- I saw this in a couple of posts but the answers were
very complex.

Any help will be greatly appreciated,
Best wishes,
M.
#
On Thu, 17 Oct 2019, Marta Rufino wrote:

            
Please state all versions:

sessionInfo()
sf_extSoftVersion()

With an updated system, most of your code just does not work for me. You 
are looking for sf::aggregate():

kk <- aggregate(world_map, list(world_map$continent), head, n=1)

plot(st_geometry(kk))

shows that although the country boundaries are largely removed, the 
underlying data are not properly aligned, so slivers remain, some on 
continent boundaries, some as holes in land masses.

Contributions welcome to remove the artefacts.

Using tidyverse really occludes analysis here.

Note that EPSG 42310 simply does not exist in PROJ 6, it is retrievable 
from:

kk1 <- st_transform(kk, crs = paste0("+proj=merc +lat_ts=0 +lon_0=0",
  " +k=1.000000 +x_0=0 +y_0=0 +ellps=GRS80 +datum=WGS84 +units=m"))
plot(st_geometry(kk1))

and should never be used for obvious reasons.

kk2 <- st_transform(kk, crs=3857)
plot(st_geometry(kk2))

is not much better (Web Mercator).

Hope this clarifies,

Roger

  
    
#
On Thu, 17 Oct 2019, Roger Bivand wrote:

            
But:

library(rgeos)
wm <- as(world_map, "Spatial")
cs <- gUnaryUnion(wm, id=as.character(wm$continent))
cs_sf <- st_as_sf(cs)
cs_sf$continent <- row.names(cs)
cs_sf
object.size(cs_sf)
plot(cs_sf)

gets much closer.

Maybe try st_precision() for some suitable value? The precision model in 
rgeos multiplies all coordinates by 1e+8 and rounds to integer, reducing 
boundary slivers.

Roger

  
    
#
Dear Roger,

Thank you very much for your quick and dedicated response.

Please state all versions:
See bellow. I though it was a trivial issue, thus sorry for not reporting
it earlier.
Just updated everything, r included, and it still runs in my machine -
perhaps because I failed to update to proj6 and I am still using proj4 - I
will have to dedicate more time to overcome it. Is there sites that explain
win-dummies installation?
This will be then a fourth option:
# Answer from ROGER:
kk4 <- aggregate(world_map, list(world_map$continent), head, n=1)

object.size(kk4)
kk4 %>%
  ggplot()+
  geom_sf()

The sp_transform was simply to check if it was workable the produced file
(in other cases, the code functioned, but then I could not work with the
files produced) - so ok to remove it or change it.

You example using rgeos is really nice and I am grateful for it, but I
really wanted to understand how to do it overall so I can apply in other
situations I am working with.
I don't understand the difference between all those options, that
apparently are giving the same result, but not so much in fact.

Any further ideas on the subject?

Thank you once again,
Best wishes,
M.
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
[7] base

other attached packages:
 [1] rgeos_0.5-2         ggthemes_4.2.0      ggrepel_0.8.1
 [4] ggplot2_3.2.1       tidyr_1.0.0         scales_1.0.0
 [7] dplyr_0.8.3         reshape2_1.4.3      viridis_0.5.1
[10] viridisLite_0.3.0   units_0.6-5         rnaturalearth_0.1.0
[13] tmap_2.3-1          mapview_2.7.0       plotKML_0.5-9
[16] lubridate_1.7.4     zoo_1.8-6           rasterVis_0.46
[19] latticeExtra_0.6-28 RColorBrewer_1.1-2  lattice_0.20-38
[22] raster_3.0-7        maptools_0.9-8      sf_0.8-0
[25] gstat_2.0-3         rgdal_1.4-6         sp_1.3-1

loaded via a namespace (and not attached):
 [1] colorspace_1.4-1   class_7.3-15       colorRamps_2.3
 [4] leaflet_2.0.2      htmlTable_1.13.2   satellite_1.0.1
 [7] base64enc_0.1-3    dichromat_2.0-0    rstudioapi_0.10
[10] hexbin_1.27.3      fansi_0.4.0        codetools_0.2-16
[13] splines_3.6.1      knitr_1.25         zeallot_0.1.0
[16] Formula_1.2-3      tmaptools_2.0-2    cluster_2.1.0
[19] png_0.1-7          shiny_1.4.0        compiler_3.6.1
[22] backports_1.1.5    assertthat_0.2.1   Matrix_1.2-17
[25] fastmap_1.0.1      lazyeval_0.2.2     cli_1.1.0
[28] later_1.0.0        acepack_1.4.1      htmltools_0.4.0
[31] tools_3.6.1        gtable_0.3.0       glue_1.3.1
[34] Rcpp_1.0.2         vctrs_0.2.0        leafsync_0.1.0
[37] crosstalk_1.0.0    lwgeom_0.1-7       xfun_0.10
[40] stringr_1.4.0      mime_0.7           lifecycle_0.1.0
[43] XML_3.98-1.20      MASS_7.3-51.4      promises_1.1.0
[46] parallel_3.6.1     yaml_2.2.0         gridExtra_2.3
[49] aqp_1.17           rpart_4.1-15       reshape_0.8.8
[52] stringi_1.4.3      plotrix_3.7-6      e1071_1.7-2
[55] checkmate_1.9.4    intervals_0.15.1   rlang_0.4.0
[58] pkgconfig_2.0.3    pixmap_0.4-11      RSAGA_1.3.0
[61] purrr_0.3.2        htmlwidgets_1.5.1  tidyselect_0.2.5
[64] plyr_1.8.4         magrittr_1.5       R6_2.4.0
[67] Hmisc_4.2-0        DBI_1.0.0          pillar_1.4.2
[70] foreign_0.8-72     withr_2.1.2        shapefiles_0.7
[73] xts_0.11-2         survival_2.44-1.1  nnet_7.3-12
[76] tibble_2.1.3       spacetime_1.2-2    crayon_1.3.4
[79] utf8_1.1.4         KernSmooth_2.23-15 grid_3.6.1
[82] data.table_1.12.4  FNN_1.1.3          dismo_1.1-4
[85] digest_0.6.21      classInt_0.4-1     webshot_0.5.1
[88] xtable_1.8-4       httpuv_1.5.2       stats4_3.6.1
[91] munsell_0.5.0
GEOS           GDAL         proj.4 GDAL_with_GEOS
       "3.6.1"        "2.2.3"        "4.9.3"         "true"
    USE_PROJ_H
       "false"

  
  
#
On Thu, 17 Oct 2019, Marta Rufino wrote:

            
You don't need to update yet, but modern PROJ will upset many workflows.
My attempts to use st_precision() to try to find a precision level 
that removed the artefacts ended in segmentation faults, so the current 
best workaround is as shown to use rgeos, until the correct sf incantation 
is forthcoming.

Roger

  
    
#
rnaturalearth is really dirty data, there's no way forward when things are
so bad, only backing up can help.

Do you need the whole world? Where do you need, and what scale? Your
question suggests that finding the right data is the solution rather than
battling with software.

Cheers, Mike
On Fri., 18 Oct. 2019, 00:27 Roger Bivand, <Roger.Bivand at nhh.no> wrote:

            

  
  
#
Buffering the data by a teeny tiny number of fractional degrees is
sufficient to make the edges overlap enough to dissolve properly. Hacky
solution, and results in a world that is 0.00001 degrees more coastline all
round (a negative buffer can correct for this a bit).

kk <- aggregate(st_buffer(world_map,.00000001), list(world_map$continent),
head, n=1)

Barry
On Thu, Oct 17, 2019 at 4:22 PM Michael Sumner <mdsumner at gmail.com> wrote:

            

  
  
#
Thank you very much to all :)

Barry+Roger contributions sorted the problem for the moment, although in my
mind there are still things to be understood.
Mike, yep, thank you for alerting me of rnaturalearth issues- I was not
aware of it. In this case, I am really interested in the process to
apply in other polygons, not in the data *per se*, but will take this into
account in the future.

Once I have the doc produced (distribution of world
estuaries/bays/transition waters), I will post on the list :)

Best wishes,
M.
#
Cool, thanks for clarifying ?
On Fri., 18 Oct. 2019, 02:48 Marta Rufino, <marta.m.rufino at gmail.com> wrote:

            

  
  
#
Another option you have available is ?ms_dissolve()? in the ?rmapshaper' package (full disclosure - I am the author). The javascript library that powers it (maphsaper[1]) builds topology with a small tolerance to repair slivers and gaps, and thus can often fix issues like this:

require(rnaturalearth)
require(sf)
library(rmapshaper)
world_map <- ne_countries(scale = 'small', returnclass = "sf")

# World countries:
world_map
object.size(world_map)

# Dissolve borders within continents with ms_dissolve()
world_dissolved <- ms_dissolve(world_map, field = "continent?)
plot(world_dissolved)

Cheers,
Andy Teucher

[1] mapshaper: https://github.com/mbloch/mapshaper
4 days later
#
Hi,

Thank you very much to everybody.
Here it is a summary of all answers obtained (I hope), with the example
objects (one map and one squared set of polygons).

Cheers,
M.

PS: it was done to produce the world distribution of estuaries (
http://rpubs.com/MRufino/541732)

# If using a simple polygon example (
https://cran.r-project.org/web/packages/sf/vignettes/sf3.html):
b0 = st_polygon(list(rbind(c(-1,-1), c(1,-1), c(1,1), c(-1,1), c(-1,-1))))
b1 = b0 + 2
b2 = b0 + c(-0.2, 2)
x = st_sfc(b0, b1, b2)
a0 = b0 * 0.8
a1 = a0 * 0.5 + c(2, 0.7)
a2 = a0 + 1
a3 = b0 * 0.5 + c(2, -0.5)
y = st_sfc(a0,a1,a2,a3)
plot(x, border = 'red')
plot(y, border = 'green', add = TRUE)

world_map <- rbind(st_as_sf(x),st_as_sf(y)) %>% st_set_crs(4326)
world_map$continent <- c(rep("a",4), rep("b",3))
world_map
plot(world_map)

# World map (if using real data):
# world_map <- rnaturalearth::ne_countries(scale = 'small', returnclass =
c("sf"))

# World countries:
world_map
object.size(world_map)

# 1st case: in this case we have the groups by continent, but the country
boundaries are still there, although without label:
(kk <- world_map %>%
  dplyr::group_by(continent) %>%
  summarise())
object.size(kk)
kk %>%
  st_transform(crs = 3857) %>%
  ggplot()+
  geom_sf()

# 2nd case: appeantly different, but very similar...
(kk1 <- world_map %>%
  dplyr::group_by(continent) %>%
  summarise(st_union(.)))
object.size(kk1)
kk1 %>%
  st_transform(crs = 3857) %>%
  ggplot()+
  geom_sf()

# 3rd case (does not work with polygon example)
(kk2 <- world_map %>%
  dplyr::group_by(continent) %>%
  dplyr::summarize(geom = st_union(geometry)))
object.size(kk2)
kk2 %>%
  st_transform(crs = 3857) %>%
  ggplot()+
  geom_sf()

# 4th case  (does not work with polygon example)
(kk3 <- world_map %>%
  dplyr::group_by(continent) %>%
  dplyr::summarize(geom = st_combine(geometry)))
object.size(kk3)
kk3 %>%
  st_transform(crs = 3857) %>%
  ggplot()+
  geom_sf()

# Suggested from Roger Bivand and Barry Rowlingson:
kk4 <- aggregate(st_buffer(world_map, .00000001),
list(world_map$continent), head, n=1)
kk4 %>%
  st_transform(crs = 3857) %>%
  ggplot()+
  geom_sf()


# Suggested by Roger Bivand:
library(rgeos)
wm <- as(world_map, "Spatial")
cs <- gUnaryUnion(wm, id=as.character(wm$continent))
kk5 <- st_as_sf(cs)
kk5$continent <- row.names(cs)
kk5
object.size(kk5)
kk5 %>%
  st_transform(crs = 3857) %>%
  ggplot()+
  geom_sf()

# Suggested by Andy Teucher:
library(rmapshaper)

kk6 <- ms_dissolve(world_map, field = "continent")
kk6 %>%
  st_transform(crs = 3857) %>%
  ggplot()+
  geom_sf()

# In all cases the objects produced are actually very similar at a first
glance, bur in fact they differ on the properties and sizes.
# In no case, the borders between countries were actually dissolved, which
was my objective

object.size(world_map)
object.size(kk)
object.size(kk1)
object.size(kk2)
object.size(kk3)
object.size(kk4)
object.size(kk5)
object.size(kk6)

# Now, the challenge is, what is the difference between all those files
that apperantly are the same?