Skip to content
Prev 27439 / 29559 Next

Running huge dataset with dnearneigh

On Mon, 1 Jul 2019, Jiawen Ng wrote:

            
I'm afraid that my retail geography is not very up to date, but also that 
your approach is most unlikely to yield constructive results.

Most retail stores are organised in large chains, so optimise costs 
between wholesale and retail. Independent retail stores depend crucially 
on access to wholesale stores, so anyway cannot locate without regard to 
supply costs. Some service activities without wholesale dependencies are 
less tied.

Most chains certainly behave strategically with regard to each other, 
sometimes locating toe-to-toe to challenge a competing chain 
(Carrefour/Tesco or their local shop variants), sometimes avoiding nearby 
competing chain locations to establish a local monopoly (think Hotelling).

Population density doesn't express demand, especially unmet demand well at 
all. Think food deserts - maybe plenty of people but little disposable 
income. Look at the food desert literature, or the US food stamp 
literature.

Finally (all bad news) retail is not only challenged by location shifting 
from high streets to malls, but critically by online shopping, which 
shifts the cost structures one the buyer is engaged at a proposed price to 
logistics, to complete the order at the highest margin including returns. 
That only marginally relates to population density.

So you'd need more data than you have, a model that explicitly handles 
competition between chains as well as market gaps, and some way of 
handling online leakage to move forward.

If population density was a proxy for accessibility (most often it isn't), 
it might look like the beginnings of a model, but most often we don't know 
what bid-rent surfaces look like, and then, most often different 
activities sort differently across those surfaces.
The model underlying spatial regressions using neighbours tapers 
dependency as the pairwise elements of (I - \rho W)^{-1} (conditional) and 
[(I - \rho W) (I - \rho W')]^{-1} (see Wall 2004). These are NxN dense 
matrices. (I - \rho W) is typically sparse, and under certain conditions 
leads to (I - \rho W)^{-1} = \sum_{i=0}^{\inf} \rho^i W^i, the sum of a 
power series in \rho and W. \rho is typically upward bounded < 1, so 
\rho^i declines as i increases. This dampens \rho^i W^i, so that i 
influences j less and less with increasing i. So in the general case IDW 
is simply replicating what simple contiguity gives you anyway. So the 
sparser W is (within reason), the better. Unless you really know that the 
physics, chemistry or biology of your system give you a known systematic 
relationship like IDW, you may as well stay with contiguity.

However, this isn't any use in solving a retail location problem at all.
When in doubt use contiguity for polygons and similar graph based methods 
for points. Try to keep the graphs planar (as few intersecting edges as 
possible - rule of thumb).
Baseline, this is not going anywhere constructive, and simply approaching 
retail location in this way is unhelpful - there is far too little 
information in your model.

If you really must, first find a fully configured retail model with the 
complete data set needed to replicate the results achieved, and use that 
to benchmark how far your approach succeeds in reaching a similar result 
for that restricted area. I think that you'll find that the retail model 
is much more successful, but if not, there is less structure in 
contemporary retail than I though.

Best wishes,

Roger