Hi Everyone,
I am trying to run the skater function for graph partitions, part of the
spdep package. My goal is to create contiguous territories for the entire
USA at the ZIP Code level.
The function takes a very long time to run even for ~15% of my total areas.
I am looking to run this for the 30,000 ZIP Codes in the USA.
The skater function documentation gives an example of parallel processing,
but it doesn?t seem to be speeding things up. I have a windows laptop with
2 physical cores and 4 logical cores. In the below code, I have already
tried to set nc = 1, nc=2 and nc=4 all with very similar results in time.
Has anyone been able to run the skater function for a large amount of areas
in a reasonable amount of time? Would really appreciate any guidance on
this, perhaps I am missing steps.
Here is the example from the documentation and which I am also running.
*library*(parallel)
nc <- detectCores(logical=FALSE)
# set nc to 1L here
*if* (nc > 1L) nc <- 1L
coresOpt <- get.coresOption()
invisible(set.coresOption(nc))
*if*(!get.mcOption()) {
# no-op, "snow" parallel calculation not available
cl <- makeCluster(get.coresOption())
set.ClusterOption(cl)
}
### calculating costs
system.time(plcosts <- nbcosts(bh.nb, dpad))
all.equal(lcosts, plcosts, check.attributes=FALSE)
### making listw
pnb.w <- nb2listw(bh.nb, plcosts, style="B")
### find a minimum spanning tree
pmst.bh <- mstree(pnb.w,5)
### three groups with no restriction
system.time(pres1 <- skater(pmst.bh[,1:2], dpad, 2))
*if*(!get.mcOption()) {
set.ClusterOption(NULL)
stopCluster(cl)
}
much appreciated!
skater - spdep runtime - geographic territories
3 messages · Salo V, Elias T. Krainski
Hi Salo, I have implemented it several years ago and this is not optimal some ways. I will update it in near future to account for an heuristic to avoid the exhaustive search that it performs. For now, you can find a significant runtime reduction considering an alternative function to compute the ssw because the way it does by default uses a lot of memory and is bad for big datasets. Please consider the attached code that illustrates this fact. When using the ssdfun() I experienced a reduction factor around 4 for n=2k. I found an additional reduction factor of 1.6 by using two (physical) cores. This is the result I got on my laptop: ????? n t1 t2 t3 t4 15? 225? 1? 1? 1? 1 20? 400? 1? 1? 1? 1 25? 625? 4? 3? 3? 2 30? 900 10? 5? 6? 4 35 1225 21? 8 13? 5 40 1600 39 12 23? 8 45 2025 86 24 50 15 best regards, Elias
On 6/11/19 5:21 PM, Salo V wrote:
Hi Everyone,
I am trying to run the skater function for graph partitions, part of the
spdep package. My goal is to create contiguous territories for the entire
USA at the ZIP Code level.
The function takes a very long time to run even for ~15% of my total areas.
I am looking to run this for the 30,000 ZIP Codes in the USA.
The skater function documentation gives an example of parallel processing,
but it doesn?t seem to be speeding things up. I have a windows laptop with
2 physical cores and 4 logical cores. In the below code, I have already
tried to set nc = 1, nc=2 and nc=4 all with very similar results in time.
Has anyone been able to run the skater function for a large amount of areas
in a reasonable amount of time? Would really appreciate any guidance on
this, perhaps I am missing steps.
Here is the example from the documentation and which I am also running.
*library*(parallel)
nc <- detectCores(logical=FALSE)
# set nc to 1L here
*if* (nc > 1L) nc <- 1L
coresOpt <- get.coresOption()
invisible(set.coresOption(nc))
*if*(!get.mcOption()) {
# no-op, "snow" parallel calculation not available
cl <- makeCluster(get.coresOption())
set.ClusterOption(cl)
}
### calculating costs
system.time(plcosts <- nbcosts(bh.nb, dpad))
all.equal(lcosts, plcosts, check.attributes=FALSE)
### making listw
pnb.w <- nb2listw(bh.nb, plcosts, style="B")
### find a minimum spanning tree
pmst.bh <- mstree(pnb.w,5)
### three groups with no restriction
system.time(pres1 <- skater(pmst.bh[,1:2], dpad, 2))
*if*(!get.mcOption()) {
set.ClusterOption(NULL)
stopCluster(cl)
}
much appreciated!
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-------------- next part -------------- A non-text attachment was scrubbed... Name: elapsed-time-ssdfun.R Type: text/x-r-source Size: 2521 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20190611/bc768363/attachment.bin>
Hi Salo, In the file I sent attached to my previous email the ssdfun() has to be replaced by the following in order to give the same results as the default option in skater(): ssdfun <- function(d,i) ??? sum(sqrt(colSums((t(d[i,,drop=FALSE])- ????????????????????? colMeans(d[i,,drop=FALSE]))^2))) So, the recommendation is to use skater(..., method=ssdfun) Best regards, Elias
On 6/11/19 5:21 PM, Salo V wrote:
Hi Everyone,
I am trying to run the skater function for graph partitions, part of the
spdep package. My goal is to create contiguous territories for the entire
USA at the ZIP Code level.
The function takes a very long time to run even for ~15% of my total areas.
I am looking to run this for the 30,000 ZIP Codes in the USA.
The skater function documentation gives an example of parallel processing,
but it doesn?t seem to be speeding things up. I have a windows laptop with
2 physical cores and 4 logical cores. In the below code, I have already
tried to set nc = 1, nc=2 and nc=4 all with very similar results in time.
Has anyone been able to run the skater function for a large amount of areas
in a reasonable amount of time? Would really appreciate any guidance on
this, perhaps I am missing steps.
Here is the example from the documentation and which I am also running.
*library*(parallel)
nc <- detectCores(logical=FALSE)
# set nc to 1L here
*if* (nc > 1L) nc <- 1L
coresOpt <- get.coresOption()
invisible(set.coresOption(nc))
*if*(!get.mcOption()) {
# no-op, "snow" parallel calculation not available
cl <- makeCluster(get.coresOption())
set.ClusterOption(cl)
}
### calculating costs
system.time(plcosts <- nbcosts(bh.nb, dpad))
all.equal(lcosts, plcosts, check.attributes=FALSE)
### making listw
pnb.w <- nb2listw(bh.nb, plcosts, style="B")
### find a minimum spanning tree
pmst.bh <- mstree(pnb.w,5)
### three groups with no restriction
system.time(pres1 <- skater(pmst.bh[,1:2], dpad, 2))
*if*(!get.mcOption()) {
set.ClusterOption(NULL)
stopCluster(cl)
}
much appreciated!
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo