Skip to content

algorthirm to join polygons based on population properties

12 messages · James Rooney, Kuroda, Virgilio Gómez-Rubio +1 more

#
Dear all,

I have dataset with very many more polygons than cases. I wish to apply Bayesian smoothing to areal disease rates, however I have too many polygons and need a smart way to combine them so that there are less overall polygons.
Bascially I need to only combine polygons of similar population density and it would be best if the new polygons have a distribution of total population that was within a limited range/normally distributed.

I can of course come up with some way of doing this myself, but I'm not keen to reinvent the wheel and so I am wondering - are there any smart algorithms already out there for doing this kind of thing ?

Thanks,
James
#
On Tue, 7 Jan 2014, James Rooney wrote:

            
This is not clear. Do you mean density (count/area) or just count? If you 
have "too many polygons", then probably you haven't thought through your 
sampling design - you need polygons with the correct support for the data 
collection protocol used. Are you looking at postcode polygons and sparse 
geocoded cases, with many empty postcodes? Are postcodes the relevant 
support?

If you think through support first (Gotway & Young 2002), then ad hoc 
aggregation (that's the easy part) may be replaced by appropriate 
aggregation (postcodes by health agency, surgery, etc.). The aggregation 
can be done with rgeos::gUnaryUnion, but you need a vector assigning 
polygons to aggregates first, preferably coded so that the data can be 
maptools::spCbind using well-matched row.names of the aggregated 
SpatialPolygons and data.frame objects to key on observation IDs.

First clarity on support, then aggregate polygons to appropriate support, 
then merge. Otherwise you are ignoring the uncertainty introduced into 
your Bayesian analysis by the aggregation (dfferent aggregations will give 
different results). There are good chapters on this in the Handbook of 
Spatial Statistics by Gelfand and Wakefield/Lyons.

Hope this clarifies,

Roger

  
    
#
Hi Roger,

Thanks for your reply. Coding the joins is not a problem I've already done that on a smaller scale in a different project.

No postcodes in my country. I have polygon data from the census and I have geocoded cases for every case of a rare disease. This is all pretty much fixed there is nothing I can do about it. I have performed an analysis based on about 3500 polygons and that works ok. However the population data has bad maths properties. There I'm now working with newer data using 18,000 polygons and the same cases. This population data has better maths properties (i.e. population per polygon is more symmetrically distributed). But there are too many polygons - most of the polygons have no cases. So when I do Bayesian smoothing I just end up with a uniform map of Relative Risk =1 everywhere as all the polygons with cases are all surrounded by polygons with no cases.

I figure to get around this I either fiddle with the spatial weighting (seems unwise), or join polygons in some sensible fashion. My question was really wondering are there algorithms to deduce a list of polygon joins based on polygon properties. For example - I don't want to join urban and rural polygons as I am interested in the association of population density with incidence rate. I'm also interested in the relationship with social deprivation - so I don't want to join an area of high deprivation with and area of low deprivation. Basically I want to know is there a package that will create me a join list based on such rules ? I can of course write some code to do it but I was hoping not to have to spend the time on it!

James
#
On Tue, 7 Jan 2014, James Rooney wrote:

            
Briefly, you have a regionalisation problem in addition to MAUP, so have a 
look at spdep::skater and the underlying paper:

Assuncao, R. M, Neves, M. C., Camara, G. and Freitas, C. da C. (2006). 
Efficient regionalization techniques for socio-economic geographical units 
using minimum spanning trees. International Journal of Geographical 
Information Science Vol. 20, No. 7, August 2006, 797-811.

However, different criteria and clustering variable subsets will give 
different output regional aggregates. You may like to check robustness by 
comparing summary statistics for the aggregates, and by comparing output 
risk values under different aggregations.

The key functions in this approach now support parallel execution, look 
carefully at the examples using the Boston dataset at the foot of the help 
page, and note the differences between Windows and Linux/OSX.

Hope this helps,

Roger

  
    
#
Ok thanks Roger I'll read up on that!

Many thanks!
James
#
Professor Rooney:

As professor Bivand remarked, key words are "regionalization,"
"region building," "zoning design," or "constrained spatial clustering."

Cf. Duque, J. C., Ramos, R. and Surinach, J. (2007) Supervised
regionalization methods: International Regional Science Review, 30
(3), 195-220.

It might help you below...

AMOEBA
http://cran.r-project.org/web/packages/AMOEBA/
(It looks like a clustering method that provides NON-exclusive clusters)

REDCAP
http://www.spatialdatamining.org/software/redcap
As far as I know, REDCAP has not implemented in R yet, unfortunately.

I hope this helps.

Sho Kuroda

--
Sho Kuroda (Mr.)
kuroda.cpa at gmail.com



2014/1/7 James Rooney <ROONEYJ4 at tcd.ie>:
#
Dear Mr Kuroda,

Many thanks for the information. I will have to take some time to digest the various links I've been sent, and learn the new lingo!

And I appreciate the honorary promotion, but it is not 'Professor' Rooney at all!
Dr Rooney if you like titles - though I personally much prefer to be called James!

James
#
Dear James,

In addition to what others have suggested, you may want to try a
different modelling approach using zero-inflated models. If you are
working on a rare disease, a zero-inflated model can accommodate the
high number of zeroes better than standard models.

Best wishes,

Virgilio
On mar, 2014-01-07 at 09:57 +0000, James Rooney wrote:
#
Dear Virgilio,

Thanks very much for the suggestion. I'll read up on it too! 

James
#
Hi Roger,

I have been trying to recreate the Boston example using my data set.
I am running into an error with the nbcosts command that I don't understand.
Error in nbcosts(SAPs.nb, dpad) : nbcosts:26disjoint connected subgraphs

SAPs.nb is the neighbourhood matrix.

What does this error mean ? I have not been able to understand it despite reading the help files.

Thanks,
James
#
On Wed, 8 Jan 2014, James Rooney wrote:

            
A connected graph is one in which there are no breaks, so that you could 
in principle create one region, if that is what the data suggest. You have 
26 disjoint (but internally connected) subgraphs, so you can either 
connect them by defining the neighbours in a different way, by dropping 
subgraphs with no cases, or by regionalising the subgraphs separately. See 
spdep::n.comp.nb for a tool to identify (and tally) your subgraphs. My 
guess is that many are not large, so can be discarded without prejudice. 
Use table() on the vector component returned by n.comp.nb() to tally the 
subgraphs, then see which ones have no cases.

Hope this helps,

Roger

  
    
#
Hi Roger,

Many thanks this was a useful suggestion.
The table to tally sub-graphs looked like this:
1     2     3     4     5     6     7     8     9 
18414     2     4     5     4     2     3     5    17 

I was able to plot these subgraphs and can see these are mostly islands and peninsulas etc. Now that I know what the problem is I should be able to fix it.
Many thanks!
James