How to efficiently generate data of neighboring points - R-SIG-Geo

Tue, Jun 2, 2020 6:14 AM #

The rtree package is very fast and memory-efficient for within-distance
calculations.
https://github.com/hunzikp/rtree

Kent Johnson
Cambridge, MA

Roger Bivand

Tue, Jun 2, 2020 6:59 AM #

On Tue, 2 Jun 2020, Kent Johnson wrote:

Thanks! Does this also apply when the input points are in geographical 
coordinates?

Roger

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Kent Johnson

Tue, Jun 2, 2020 8:24 AM #

rtree uses Euclidean distance so the points should be in a coordinate
system where this makes sense at least as a reasonable approximation.

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Tue, Jun 2, 2020 9:29 AM #

On Tue, 2 Jun 2020, Kent Johnson wrote:

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of class 
"c('list', 'RTree')"

Going back to the last century:

library(spdep)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])

gives four neighbour objects. A neighbour object is an n-list of integer 
vectors (0 encodes no neighbours), which you can use to find the rows to 
copy out to your output object.

Does this get you started?

Roger

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Lom Navanyo

Tue, Jun 2, 2020 3:19 PM #

I tried looking at rtree package and even run the example at
https://github.com/hunzikp/rtree but it gives me the same error message
Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of class
"c('list', 'RTree')"
That is strange given the page shows it was successful.

Regarding the library(spdep) and using dnearneigh(), I read through the
examples here https://r-spatial.github.io/spdep/reference/dnearneigh.html
but I seem to have a challenge as to how to get the output from wd1,wd2,
wd3, wd4  to my output object as in dataout.
For example when I call wd1, I get :

Neighbour list object:
Number of regions: 101
Number of nonzero links: 38
Percentage nonzero weights: 0.3725125
Average number of links: 0.3762376
67 regions with no links:


How do I get the original point IDs and their respective neighbors IDs?

Lom

On Tue, Jun 2, 2020 at 11:29 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a coordinate
system where this makes sense at least as a reasonable approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of class
"c('list', 'RTree')"

Going back to the last century:

library(spdep)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])

gives four neighbour objects. A neighbour object is an n-list of integer
vectors (0 encodes no neighbours), which you can use to find the rows to
copy out to your output object.

Does this get you started?

Roger

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of neighboring
        points within specified radii (distances) for each point in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am trying

to

generate data of each point and their neighbors within defined radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for within-distance
calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in geographical
coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Kent Johnson

Tue, Jun 2, 2020 6:02 PM #

Roger's example works for me and gives a list of length 101. I did have
some issues that were resolved by updating packages. I'm using R 3.6.3 on
macOS 10.15.4. I also use rtree successfully on Windows 10 with R 3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a coordinate
system where this makes sense at least as a reasonable approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of neighboring
        points within specified radii (distances) for each point in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am trying

to

generate data of each point and their neighbors within defined radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for within-distance
calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in geographical
coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Lom Navanyo

Tue, Jun 2, 2020 9:20 PM #

I had the errors with rtree using R 3.6.3. I have since changed to R 4.0.0
but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all with
length 101. I think my difficulty is my inability to output the list
detailing the point IDs t50_fid.

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com> wrote:

Roger's example works for me and gives a list of length 101. I did have
some issues that were resolved by updating packages. I'm using R 3.6.3 on
macOS 10.15.4. I also use rtree successfully on Windows 10 with R 3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a coordinate
system where this makes sense at least as a reasonable approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of
class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of neighboring
        points within specified radii (distances) for each point in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am trying

to

generate data of each point and their neighbors within defined radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in geographical
coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Wed, Jun 3, 2020 6:18 AM #

On Wed, 3 Jun 2020, Lom Navanyo wrote:

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3, sn_band4))
class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus 
SpatialStats module, with "from" and "to" columns, and here drops 
no-neighbour cases gracefully. So listw2sn() comes in useful 
for creating the output, and from there, just look-up in the 
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at 
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com> wrote:

Roger's example works for me and gives a list of length 101. I did have
some issues that were resolved by updating packages. I'm using R 3.6.3 on
macOS 10.15.4. I also use rtree successfully on Windows 10 with R 3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a coordinate
system where this makes sense at least as a reasonable approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of
class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of neighboring
        points within specified radii (distances) for each point in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am trying

to

generate data of each point and their neighbors within defined radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in geographical
coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Lom Navanyo

Wed, Jun 3, 2020 8:44 PM #

Thank you very much for your support. This gives me what I need and I must
say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to test
spatial dependence (or neighborhood effect) by running a regression model
that entails pop_size_it = beta_1*sum of pop_size of point i's neighbors
within a specified radius. So my plan is to get the neighbors for each
focal point
as per the specified bands and their attributes (eg pop_size) so I can can
add them (attribute) by the bands.

I am totally new to the area of spatial econometrics, so I am taking things
one step at a time. Some readings suggest I may need distance matrix or
weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to R

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all with
length 101. I think my difficulty is my inability to output the list
detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3, sn_band4))
class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com> wrote:

Roger's example works for me and gives a list of length 101. I did have
some issues that were resolved by updating packages. I'm using R 3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R 3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a coordinate
system where this makes sense at least as a reasonable approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of
class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Thu, Jun 4, 2020 1:48 AM #

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thanks, clarifies a good deal. Maybe look at the original localG articles 
for exploring distance relationships (Getis and Ord looked at HIV/AIDS); 
?spdep::localG or https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both 
sides. The nearest equivalent for a single band is spatialreg::lagsarlm() 
with listw=nb2listw(wd1, style="B") to get the neighbour sums through the 
weights matrix. So both your betas and their standard errors are unusable, 
I'm afraid. You are actually very much closer to ordinary kriging, looking 
at the way in which distance attenuates the correlation in value of 
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am taking things
one step at a time. Some readings suggest I may need distance matrix or
weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to R

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all with
length 101. I think my difficulty is my inability to output the list
detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3, sn_band4))
class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com> wrote:

Roger's example works for me and gives a list of length 101. I did have
some issues that were resolved by updating packages. I'm using R 3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R 3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a coordinate
system where this makes sense at least as a reasonable approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of
class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Lom Navanyo

Thu, Jun 4, 2020 11:50 AM #

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS approach. I
have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not correct
for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you for help:
My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am seeking
to test spatial dependency, do you see some fatal flaws with my approach?

I have also seen other empirical approaches like static and dynamic spatial
panel data modelling. I will be reviewing them also to see suitability for
my objective.
But, any further directions or suggestions are highly appreciated.

Thanks,
-------------------
Lom

On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to test
spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point i's
neighbors within a specified radius. So my plan is to get the neighbors
for each focal point as per the specified bands and their attributes (eg
pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG articles
for exploring distance relationships (Getis and Ord looked at HIV/AIDS);
?spdep::localG or https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is spatialreg::lagsarlm()
with listw=nb2listw(wd1, style="B") to get the neighbour sums through the
weights matrix. So both your betas and their standard errors are unusable,
I'm afraid. You are actually very much closer to ordinary kriging, looking
at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am taking

things

one step at a time. Some readings suggest I may need distance matrix or
weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to R

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all

with

length 101. I think my difficulty is my inability to output the list
detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I did

have

some issues that were resolved by updating packages. I'm using R 3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of
class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Thu, Jun 4, 2020 12:52 PM #

On Thu, 4 Jun 2020, Lom Navanyo wrote:

I do not offer private help. That would presuppose that one person has the 
answer. It would also presuppose that all exchanges are only read by the 
original poster and direct participants, while in fact others may join in, 
or follow a thread, or find the thread by searching: google supports the 
list:r-sig-geo search tag. If the thread goes private, that search is 
fruitless.

What is the support of your observations, point, or are they aggregations? 
Why may proximity make a difference - often, apparent spatial 
autocorrelation is caused by observing inappropriate entities, or by 
omitting covariates, or by using the wrong functional form.

If the data are spatial panel, you can look at the splm package. 
Personally, I have never found instruments any use at all, because the 
instruments are typically at best weak because of shared spatial processes 
with the response, unless the model is really well specified from known 
theory. In space, almost everything is close to endogeneous unless the 
opposite is demonstrated. So causal relationships are less worthwhile, 
because they are at best conditional on omitted variables and 
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix of the 
input graph of proximate neighbours (the covariance matrix of the spatial 
process), you don't need to start from more than the first order 
neighbours. Maybe your x has the same spatial pattern as y, so that the 
residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega & 
Elhorst 2015 and others), so that might be worth exploring. If only the 
direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to test
spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point i's
neighbors within a specified radius. So my plan is to get the neighbors
for each focal point as per the specified bands and their attributes (eg
pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG articles
for exploring distance relationships (Getis and Ord looked at HIV/AIDS);
?spdep::localG or https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is spatialreg::lagsarlm()
with listw=nb2listw(wd1, style="B") to get the neighbour sums through the
weights matrix. So both your betas and their standard errors are unusable,
I'm afraid. You are actually very much closer to ordinary kriging, looking
at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am taking

things

one step at a time. Some readings suggest I may need distance matrix or
weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to R

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all

with

length 101. I think my difficulty is my inability to output the list
detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I did

have

some issues that were resolved by updating packages. I'm using R 3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object of
class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Lom Navanyo

Fri, Jun 5, 2020 1:12 AM #

I fully agree with you and appreciate the listed benefits of not taking
things private. I was just trying to be sure the forum here is appropriate
and receptive of a beginner like me.

To be more explicit with regards to my observations, y is amount of water
withdrawal from wells and an important variable in x is (height of) water
level in the wells. These are end of year figures. I am using the
aggregations (sum for y and mean for water level) by band as spatial
neighborhood variables. There will be one or two indicator variables also
in x. I hope these do not
present additional hurdles.

 I am thinking Proximity is relevant in testing spatial
dependency/externality.

I will consider splm package  and the SLX model.

Thank you.
---------------
Lom

On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS approach.

have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not

correct

for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you for

help:

I do not offer private help. That would presuppose that one person has the
answer. It would also presuppose that all exchanges are only read by the
original poster and direct participants, while in fact others may join in,
or follow a thread, or find the thread by searching: google supports the
list:r-sig-geo search tag. If the thread goes private, that search is
fruitless.

My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am

seeking

to test spatial dependency, do you see some fatal flaws with my approach?

What is the support of your observations, point, or are they aggregations?
Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.

I have also seen other empirical approaches like static and dynamic

spatial

panel data modelling. I will be reviewing them also to see suitability

for

my objective.
But, any further directions or suggestions are highly appreciated.

If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because the
instruments are typically at best weak because of shared spatial processes
with the response, unless the model is really well specified from known
theory. In space, almost everything is close to endogeneous unless the
opposite is demonstrated. So causal relationships are less worthwhile,
because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix of the
input graph of proximate neighbours (the covariance matrix of the spatial
process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that the
residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega &
Elhorst 2015 and others), so that might be worth exploring. If only the
direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to test
spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point i's
neighbors within a specified radius. So my plan is to get the neighbors
for each focal point as per the specified bands and their attributes

(eg

pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG

articles

for exploring distance relationships (Getis and Ord looked at HIV/AIDS);
?spdep::localG or

https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is

spatialreg::lagsarlm()

with listw=nb2listw(wd1, style="B") to get the neighbour sums through

the

weights matrix. So both your betas and their standard errors are

unusable,

I'm afraid. You are actually very much closer to ordinary kriging,

looking

at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am taking

things

one step at a time. Some readings suggest I may need distance matrix or
weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to R

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all

with

length 101. I think my difficulty is my inability to output the list
detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I did

have

some issues that were resolved by updating packages. I'm using R

3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object

of

class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each

point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Fri, Jun 5, 2020 1:51 AM #

On Fri, 5 Jun 2020, Lom Navanyo wrote:

There are several further questions. If water level is measured at 
end-of-year, it is instantaneous at that point, and will depend on level a 
year earlier plus inflow from the movements of the water table 
(precipitation, soils and surface geology, maybe geology if deeper wells), 
minus evaporation (if an open well) and extraction. However, your y 
(extraction) is probably measured over an interval (1 Jan - 31 Dec?). It 
does not depend on level unless level is 0, but depends on the closeness 
of people extracting water for domestic, agricultural or other use.

All else equal, you would expect changes in the level in a well to depend 
on inputs, evaporation and extraction, and extraction at that well and 
other nearby wells (which may experience falls in the ground water table 
level not because the water was extracted from those wells, but at 
neighbouring wells. You may also see users shifting to neerby wells if 
their closest well runs dry.

So you probably need to start with a deterministic hydrological model, and 
you need much more information about who extracts and why. Say in India, 
you would also need price data - apparently free water has led to 
over-extraction.

So I would advise against any spatial econometric analysis of the data you 
have, because so much is going on in the system as a whole that you cannot 
control if all the data you have is as you describe. I also understand 
better why well water level is endogeneous, but am sure that IV will not 
help, since the level is being driven partly by a deterministic 
hydrological system which differs from well to well, and extraction varies 
by demand.

Has anyone worked with this kind of data? Any ideas or contributions more 
helpful than the above?

Roger

I am thinking Proximity is relevant in testing spatial
dependency/externality.

I will consider splm package  and the SLX model.

Thank you.
---------------
Lom

On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS approach.

have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not

correct

for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you for

help:

I do not offer private help. That would presuppose that one person has the
answer. It would also presuppose that all exchanges are only read by the
original poster and direct participants, while in fact others may join in,
or follow a thread, or find the thread by searching: google supports the
list:r-sig-geo search tag. If the thread goes private, that search is
fruitless.

My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am

seeking

to test spatial dependency, do you see some fatal flaws with my approach?

What is the support of your observations, point, or are they aggregations?
Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.

I have also seen other empirical approaches like static and dynamic

spatial

panel data modelling. I will be reviewing them also to see suitability

for

my objective.
But, any further directions or suggestions are highly appreciated.

If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because the
instruments are typically at best weak because of shared spatial processes
with the response, unless the model is really well specified from known
theory. In space, almost everything is close to endogeneous unless the
opposite is demonstrated. So causal relationships are less worthwhile,
because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix of the
input graph of proximate neighbours (the covariance matrix of the spatial
process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that the
residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega &
Elhorst 2015 and others), so that might be worth exploring. If only the
direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to test
spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point i's
neighbors within a specified radius. So my plan is to get the neighbors
for each focal point as per the specified bands and their attributes

(eg

pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG

articles

for exploring distance relationships (Getis and Ord looked at HIV/AIDS);
?spdep::localG or

https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is

spatialreg::lagsarlm()

with listw=nb2listw(wd1, style="B") to get the neighbour sums through

the

weights matrix. So both your betas and their standard errors are

unusable,

I'm afraid. You are actually very much closer to ordinary kriging,

looking

at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am taking

things

one step at a time. Some readings suggest I may need distance matrix or
weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to R

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all

with

length 101. I think my difficulty is my inability to output the list
detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I did

have

some issues that were resolved by updating packages. I'm using R

3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object

of

class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each

point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Lom Navanyo

Fri, Jun 5, 2020 3:28 AM #

Thank you once again. To clarify, which is more suitable, end of year water
levels or yearly average measure of water levels?

Also below are a few more notes to throw more light on my variables/data:

These wells are solely for irrigation purposes and are
irrigator/farmer-owned and operated.
No farmer/irrigator moves to another well not owned by him. The only reason
to suspect any spatial externalities is because the wells share a common
aquifer.
And this is essentially what I am testing.

It is also understood that there are not much variation in the geography
and geology of the study region.

I have data a number of well specific features in addition to the water
level. I also have some farm data including cropping and technology use
data. No soil data though.
No recharge data too as well.

In fact, I agree a lot factors can come to play here and I may not have or
observe all but I was thinking I could incorporate some fixed effects
to take care of those, especially for those I suspect (or perhaps by
theory) are likely to not vary much in terms of their effect on
irrigation(pumping) decisions across farmers
or effect on water level.

My panel is rather a short one: I have a five year panel data.

Given the above, is it still not advisable to use any spatial econometric
analysis? Just a simple OLS will suffice?

Thanks.
----------------------
Lom

On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

I fully agree with you and appreciate the listed benefits of not taking
things private. I was just trying to be sure the forum here is

appropriate

and receptive of a beginner like me.

To be more explicit with regards to my observations, y is amount of
water withdrawal from wells and an important variable in x is (height
of) water level in the wells. These are end of year figures. I am using
the aggregations (sum for y and mean for water level) by band as spatial
neighborhood variables. There will be one or two indicator variables
also in x. I hope these do not present additional hurdles.

There are several further questions. If water level is measured at
end-of-year, it is instantaneous at that point, and will depend on level a
year earlier plus inflow from the movements of the water table
(precipitation, soils and surface geology, maybe geology if deeper wells),
minus evaporation (if an open well) and extraction. However, your y
(extraction) is probably measured over an interval (1 Jan - 31 Dec?). It
does not depend on level unless level is 0, but depends on the closeness
of people extracting water for domestic, agricultural or other use.

All else equal, you would expect changes in the level in a well to depend
on inputs, evaporation and extraction, and extraction at that well and
other nearby wells (which may experience falls in the ground water table
level not because the water was extracted from those wells, but at
neighbouring wells. You may also see users shifting to neerby wells if
their closest well runs dry.

So you probably need to start with a deterministic hydrological model, and
you need much more information about who extracts and why. Say in India,
you would also need price data - apparently free water has led to
over-extraction.

So I would advise against any spatial econometric analysis of the data you
have, because so much is going on in the system as a whole that you cannot
control if all the data you have is as you describe. I also understand
better why well water level is endogeneous, but am sure that IV will not
help, since the level is being driven partly by a deterministic
hydrological system which differs from well to well, and extraction varies
by demand.

Has anyone worked with this kind of data? Any ideas or contributions more
helpful than the above?

Roger

I am thinking Proximity is relevant in testing spatial
dependency/externality.

I will consider splm package  and the SLX model.

Thank you.
---------------
Lom

On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS

approach.

have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not

correct

for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you for

help:

I do not offer private help. That would presuppose that one person has

the

answer. It would also presuppose that all exchanges are only read by the
original poster and direct participants, while in fact others may join

in,

or follow a thread, or find the thread by searching: google supports the
list:r-sig-geo search tag. If the thread goes private, that search is
fruitless.

My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am

seeking

to test spatial dependency, do you see some fatal flaws with my

approach?

What is the support of your observations, point, or are they

aggregations?

Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.

I have also seen other empirical approaches like static and dynamic

spatial

panel data modelling. I will be reviewing them also to see suitability

for

my objective.
But, any further directions or suggestions are highly appreciated.

If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because the
instruments are typically at best weak because of shared spatial

processes

with the response, unless the model is really well specified from known
theory. In space, almost everything is close to endogeneous unless the
opposite is demonstrated. So causal relationships are less worthwhile,
because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix of

the

input graph of proximate neighbours (the covariance matrix of the

spatial

process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that the
residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega &
Elhorst 2015 and others), so that might be worth exploring. If only the
direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to

test

spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point i's
neighbors within a specified radius. So my plan is to get the

neighbors

for each focal point as per the specified bands and their attributes

(eg

pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG

articles

for exploring distance relationships (Getis and Ord looked at

HIV/AIDS);

?spdep::localG or

https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is

spatialreg::lagsarlm()

with listw=nb2listw(wd1, style="B") to get the neighbour sums through

the

weights matrix. So both your betas and their standard errors are

unusable,

I'm afraid. You are actually very much closer to ordinary kriging,

looking

at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am taking

things

one step at a time. Some readings suggest I may need distance matrix

or

weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all

with

length 101. I think my difficulty is my inability to output the

list

detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I did

have

some issues that were resolved by updating packages. I'm using R

3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object

of

class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <

Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each

point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within

defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Sun, Jun 7, 2020 3:06 AM #

On Fri, 5 Jun 2020, Lom Navanyo wrote:

Is irrigation by fixed pipe, or can the water be moved to the area of 
another well? Can irrigators extract water from multiple wells? Is then 
the irrigator the unit of observation rather than the well?

OK, the farming data may reflect the demand for water. Do the different 
crops or technologies have different seasonal patterns, leading to 
different draw-down patterns in the wells over time?

OLS probably not, but the decisions are starting to look like farmers' 
cropping decisions, leading to varied need for water. Do the farmers pay 
for the water or the irrigation technology?

I'm starting to think that maybe SUR is a possibility, but am unsure how 
your short panel would handle that.

Roger

Thanks.
----------------------
Lom





On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

I fully agree with you and appreciate the listed benefits of not taking
things private. I was just trying to be sure the forum here is

appropriate

and receptive of a beginner like me.

To be more explicit with regards to my observations, y is amount of
water withdrawal from wells and an important variable in x is (height
of) water level in the wells. These are end of year figures. I am using
the aggregations (sum for y and mean for water level) by band as spatial
neighborhood variables. There will be one or two indicator variables
also in x. I hope these do not present additional hurdles.

There are several further questions. If water level is measured at
end-of-year, it is instantaneous at that point, and will depend on level a
year earlier plus inflow from the movements of the water table
(precipitation, soils and surface geology, maybe geology if deeper wells),
minus evaporation (if an open well) and extraction. However, your y
(extraction) is probably measured over an interval (1 Jan - 31 Dec?). It
does not depend on level unless level is 0, but depends on the closeness
of people extracting water for domestic, agricultural or other use.

All else equal, you would expect changes in the level in a well to depend
on inputs, evaporation and extraction, and extraction at that well and
other nearby wells (which may experience falls in the ground water table
level not because the water was extracted from those wells, but at
neighbouring wells. You may also see users shifting to neerby wells if
their closest well runs dry.

So you probably need to start with a deterministic hydrological model, and
you need much more information about who extracts and why. Say in India,
you would also need price data - apparently free water has led to
over-extraction.

So I would advise against any spatial econometric analysis of the data you
have, because so much is going on in the system as a whole that you cannot
control if all the data you have is as you describe. I also understand
better why well water level is endogeneous, but am sure that IV will not
help, since the level is being driven partly by a deterministic
hydrological system which differs from well to well, and extraction varies
by demand.

Has anyone worked with this kind of data? Any ideas or contributions more
helpful than the above?

Roger

I am thinking Proximity is relevant in testing spatial
dependency/externality.

I will consider splm package  and the SLX model.

Thank you.
---------------
Lom

On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS

approach.

have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not

correct

for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you for

help:

I do not offer private help. That would presuppose that one person has

the

answer. It would also presuppose that all exchanges are only read by the
original poster and direct participants, while in fact others may join

in,

or follow a thread, or find the thread by searching: google supports the
list:r-sig-geo search tag. If the thread goes private, that search is
fruitless.

My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am

seeking

to test spatial dependency, do you see some fatal flaws with my

approach?

What is the support of your observations, point, or are they

aggregations?

Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.

I have also seen other empirical approaches like static and dynamic

spatial

panel data modelling. I will be reviewing them also to see suitability

for

my objective.
But, any further directions or suggestions are highly appreciated.

If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because the
instruments are typically at best weak because of shared spatial

processes

with the response, unless the model is really well specified from known
theory. In space, almost everything is close to endogeneous unless the
opposite is demonstrated. So causal relationships are less worthwhile,
because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix of

the

input graph of proximate neighbours (the covariance matrix of the

spatial

process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that the
residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega &
Elhorst 2015 and others), so that might be worth exploring. If only the
direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to

test

spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point i's
neighbors within a specified radius. So my plan is to get the

neighbors

for each focal point as per the specified bands and their attributes

(eg

pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG

articles

for exploring distance relationships (Getis and Ord looked at

HIV/AIDS);

?spdep::localG or

https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is

spatialreg::lagsarlm()

with listw=nb2listw(wd1, style="B") to get the neighbour sums through

the

weights matrix. So both your betas and their standard errors are

unusable,

I'm afraid. You are actually very much closer to ordinary kriging,

looking

at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am taking

things

one step at a time. Some readings suggest I may need distance matrix

or

weight matrix but for now I think I should try the current approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed to

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4, all

with

length 101. I think my difficulty is my inability to output the

list

detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own neighbours.

It would be relevant to know why you need these, are you looking at
variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I did

have

some issues that were resolved by updating packages. I'm using R

3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with R

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an object

of

class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <

Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each

point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I am

trying

to

generate data of each point and their neighbors within

defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Lom Navanyo

Sun, Jun 7, 2020 5:23 PM #

Some farmers own more than one well and thus can extract from their
multiple wells. Others are single well owners.

The amount of water pumped by the irrigator from their wells is the unit of
observation. And I do not know how it might sound but
I would say "irrigator-well" is the unit of analysis?

Both crops and technology have seasonal patterns, though not pronounced
probably due to switching costs.

I have two segments of the data: A section or  a group of neighboring
irrigators pay fees for water withdrawal. The second group (of neighbors)
does
not pay any fee aside their individual lift cost (which is not observed in
the data). I do not intend to run a difference-in-difference model with
respect
to the fee as that's not what I want to study. So I intend to run separate
models/specifications for the two groups.

Thanks,
-----------------
Lom

On Sun, Jun 7, 2020 at 5:06 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

Thank you once again. To clarify, which is more suitable, end of year

water

levels or yearly average measure of water levels?

Also below are a few more notes to throw more light on my variables/data:

These wells are solely for irrigation purposes and are
irrigator/farmer-owned and operated. No farmer/irrigator moves to
another well not owned by him. The only reason to suspect any spatial
externalities is because the wells share a common aquifer. And this is
essentially what I am testing.

Is irrigation by fixed pipe, or can the water be moved to the area of
another well? Can irrigators extract water from multiple wells? Is then
the irrigator the unit of observation rather than the well?

It is also understood that there are not much variation in the geography
and geology of the study region.

I have data a number of well specific features in addition to the water
level. I also have some farm data including cropping and technology use
data. No soil data though.
No recharge data too as well.

OK, the farming data may reflect the demand for water. Do the different
crops or technologies have different seasonal patterns, leading to
different draw-down patterns in the wells over time?

In fact, I agree a lot factors can come to play here and I may not have

or

observe all but I was thinking I could incorporate some fixed effects
to take care of those, especially for those I suspect (or perhaps by
theory) are likely to not vary much in terms of their effect on
irrigation(pumping) decisions across farmers
or effect on water level.

My panel is rather a short one: I have a five year panel data.

Given the above, is it still not advisable to use any spatial econometric
analysis? Just a simple OLS will suffice?

OLS probably not, but the decisions are starting to look like farmers'
cropping decisions, leading to varied need for water. Do the farmers pay
for the water or the irrigation technology?

I'm starting to think that maybe SUR is a possibility, but am unsure how
your short panel would handle that.

Roger

Thanks.
----------------------
Lom





On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

I fully agree with you and appreciate the listed benefits of not taking
things private. I was just trying to be sure the forum here is

appropriate

and receptive of a beginner like me.

To be more explicit with regards to my observations, y is amount of
water withdrawal from wells and an important variable in x is (height
of) water level in the wells. These are end of year figures. I am using
the aggregations (sum for y and mean for water level) by band as

spatial

neighborhood variables. There will be one or two indicator variables
also in x. I hope these do not present additional hurdles.

There are several further questions. If water level is measured at
end-of-year, it is instantaneous at that point, and will depend on

level a

year earlier plus inflow from the movements of the water table
(precipitation, soils and surface geology, maybe geology if deeper

wells),

minus evaporation (if an open well) and extraction. However, your y
(extraction) is probably measured over an interval (1 Jan - 31 Dec?). It
does not depend on level unless level is 0, but depends on the closeness
of people extracting water for domestic, agricultural or other use.

All else equal, you would expect changes in the level in a well to

depend

on inputs, evaporation and extraction, and extraction at that well and
other nearby wells (which may experience falls in the ground water table
level not because the water was extracted from those wells, but at
neighbouring wells. You may also see users shifting to neerby wells if
their closest well runs dry.

So you probably need to start with a deterministic hydrological model,

and

you need much more information about who extracts and why. Say in India,
you would also need price data - apparently free water has led to
over-extraction.

So I would advise against any spatial econometric analysis of the data

you

have, because so much is going on in the system as a whole that you

cannot

control if all the data you have is as you describe. I also understand
better why well water level is endogeneous, but am sure that IV will not
help, since the level is being driven partly by a deterministic
hydrological system which differs from well to well, and extraction

varies

by demand.

Has anyone worked with this kind of data? Any ideas or contributions

more

helpful than the above?

Roger

I am thinking Proximity is relevant in testing spatial
dependency/externality.

I will consider splm package  and the SLX model.

Thank you.
---------------
Lom

On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS

approach.

have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not

correct

for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you for

help:

I do not offer private help. That would presuppose that one person has

the

answer. It would also presuppose that all exchanges are only read by

the

original poster and direct participants, while in fact others may join

in,

or follow a thread, or find the thread by searching: google supports

the

list:r-sig-geo search tag. If the thread goes private, that search is
fruitless.

My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am

seeking

to test spatial dependency, do you see some fatal flaws with my

approach?

What is the support of your observations, point, or are they

aggregations?

Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.

I have also seen other empirical approaches like static and dynamic

spatial

panel data modelling. I will be reviewing them also to see

suitability

for

my objective.
But, any further directions or suggestions are highly appreciated.

If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because the
instruments are typically at best weak because of shared spatial

processes

with the response, unless the model is really well specified from

known

theory. In space, almost everything is close to endogeneous unless the
opposite is demonstrated. So causal relationships are less worthwhile,
because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix of

the

input graph of proximate neighbours (the covariance matrix of the

spatial

process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that

the

residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega &
Elhorst 2015 and others), so that might be worth exploring. If only

the

direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need

and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to

test

spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point

i's

neighbors within a specified radius. So my plan is to get the

neighbors

for each focal point as per the specified bands and their

attributes

(eg

pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG

articles

for exploring distance relationships (Getis and Ord looked at

HIV/AIDS);

?spdep::localG or

https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is

spatialreg::lagsarlm()

with listw=nb2listw(wd1, style="B") to get the neighbour sums

through

the

weights matrix. So both your betas and their standard errors are

unusable,

I'm afraid. You are actually very much closer to ordinary kriging,

looking

at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am

taking

things

one step at a time. Some readings suggest I may need distance

matrix

or

weight matrix but for now I think I should try the current

approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed

to

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4,

all

with

length 101. I think my difficulty is my inability to output the

list

detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own

neighbours.

It would be relevant to know why you need these, are you looking

at

variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I

did

have

some issues that were resolved by updating packages. I'm using R

3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <

Roger.Bivand at nhh.no

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than

yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an

object

of

class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <

Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each

point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I

am

trying

to

generate data of each point and their neighbors within

defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Tue, Jun 9, 2020 3:35 AM #

On Mon, 8 Jun 2020, Lom Navanyo wrote:

This feels like a linear mixed effects model with an irrigator random 
effect and a temporal random effect. A spatial random effect (ICAR?) might 
be added, but it will be hard to split the identification of the irrigator 
RE from a spatially structured RE for the wells. I think that you should 
be looking at the mgcv package, the second edition of Simon Wood's book, 
and either an MRF or a Gaussian Process ("gp") spatial RE for the wells.

It may very well be that a group RE (fee/no fee) would discriminate 
between the groups statistically, but I'm out of my depth here. Anyway, 
mgcv, using a flexible functional form on water level, and RE's for the 
other components, seems possible. Structural regression using BayesX or 
INLA are also possible. You have 5 years, how many irrigators and how many 
wells?

Roger

Thanks,
-----------------
Lom

On Sun, Jun 7, 2020 at 5:06 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

Thank you once again. To clarify, which is more suitable, end of year

water

levels or yearly average measure of water levels?

Also below are a few more notes to throw more light on my variables/data:

These wells are solely for irrigation purposes and are
irrigator/farmer-owned and operated. No farmer/irrigator moves to
another well not owned by him. The only reason to suspect any spatial
externalities is because the wells share a common aquifer. And this is
essentially what I am testing.

Is irrigation by fixed pipe, or can the water be moved to the area of
another well? Can irrigators extract water from multiple wells? Is then
the irrigator the unit of observation rather than the well?

It is also understood that there are not much variation in the geography
and geology of the study region.

I have data a number of well specific features in addition to the water
level. I also have some farm data including cropping and technology use
data. No soil data though.
No recharge data too as well.

OK, the farming data may reflect the demand for water. Do the different
crops or technologies have different seasonal patterns, leading to
different draw-down patterns in the wells over time?

In fact, I agree a lot factors can come to play here and I may not have

or

observe all but I was thinking I could incorporate some fixed effects
to take care of those, especially for those I suspect (or perhaps by
theory) are likely to not vary much in terms of their effect on
irrigation(pumping) decisions across farmers
or effect on water level.

My panel is rather a short one: I have a five year panel data.

Given the above, is it still not advisable to use any spatial econometric
analysis? Just a simple OLS will suffice?

OLS probably not, but the decisions are starting to look like farmers'
cropping decisions, leading to varied need for water. Do the farmers pay
for the water or the irrigation technology?

I'm starting to think that maybe SUR is a possibility, but am unsure how
your short panel would handle that.

Roger

Thanks.
----------------------
Lom





On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

I fully agree with you and appreciate the listed benefits of not taking
things private. I was just trying to be sure the forum here is

appropriate

and receptive of a beginner like me.

To be more explicit with regards to my observations, y is amount of
water withdrawal from wells and an important variable in x is (height
of) water level in the wells. These are end of year figures. I am using
the aggregations (sum for y and mean for water level) by band as

spatial

neighborhood variables. There will be one or two indicator variables
also in x. I hope these do not present additional hurdles.

There are several further questions. If water level is measured at
end-of-year, it is instantaneous at that point, and will depend on

level a

year earlier plus inflow from the movements of the water table
(precipitation, soils and surface geology, maybe geology if deeper

wells),

minus evaporation (if an open well) and extraction. However, your y
(extraction) is probably measured over an interval (1 Jan - 31 Dec?). It
does not depend on level unless level is 0, but depends on the closeness
of people extracting water for domestic, agricultural or other use.

All else equal, you would expect changes in the level in a well to

depend

on inputs, evaporation and extraction, and extraction at that well and
other nearby wells (which may experience falls in the ground water table
level not because the water was extracted from those wells, but at
neighbouring wells. You may also see users shifting to neerby wells if
their closest well runs dry.

So you probably need to start with a deterministic hydrological model,

and

you need much more information about who extracts and why. Say in India,
you would also need price data - apparently free water has led to
over-extraction.

So I would advise against any spatial econometric analysis of the data

you

have, because so much is going on in the system as a whole that you

cannot

control if all the data you have is as you describe. I also understand
better why well water level is endogeneous, but am sure that IV will not
help, since the level is being driven partly by a deterministic
hydrological system which differs from well to well, and extraction

varies

by demand.

Has anyone worked with this kind of data? Any ideas or contributions

more

helpful than the above?

Roger

I am thinking Proximity is relevant in testing spatial
dependency/externality.

I will consider splm package  and the SLX model.

Thank you.
---------------
Lom

On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS

approach.

have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not

correct

for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you for

help:

I do not offer private help. That would presuppose that one person has

the

answer. It would also presuppose that all exchanges are only read by

the

original poster and direct participants, while in fact others may join

in,

or follow a thread, or find the thread by searching: google supports

the

list:r-sig-geo search tag. If the thread goes private, that search is
fruitless.

My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am

seeking

to test spatial dependency, do you see some fatal flaws with my

approach?

What is the support of your observations, point, or are they

aggregations?

Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.

I have also seen other empirical approaches like static and dynamic

spatial

panel data modelling. I will be reviewing them also to see

suitability

for

my objective.
But, any further directions or suggestions are highly appreciated.

If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because the
instruments are typically at best weak because of shared spatial

processes

with the response, unless the model is really well specified from

known

theory. In space, almost everything is close to endogeneous unless the
opposite is demonstrated. So causal relationships are less worthwhile,
because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix of

the

input graph of proximate neighbours (the covariance matrix of the

spatial

process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that

the

residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega &
Elhorst 2015 and others), so that might be worth exploring. If only

the

direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need

and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying to

test

spatial dependence (or neighborhood effect) by running a regression
model that entails pop_size_it = beta_1*sum of pop_size of point

i's

neighbors within a specified radius. So my plan is to get the

neighbors

for each focal point as per the specified bands and their

attributes

(eg

pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG

articles

for exploring distance relationships (Getis and Ord looked at

HIV/AIDS);

?spdep::localG or

https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on both
sides. The nearest equivalent for a single band is

spatialreg::lagsarlm()

with listw=nb2listw(wd1, style="B") to get the neighbour sums

through

the

weights matrix. So both your betas and their standard errors are

unusable,

I'm afraid. You are actually very much closer to ordinary kriging,

looking

at the way in which distance attenuates the correlation in value of
proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am

taking

things

one step at a time. Some readings suggest I may need distance

matrix

or

weight matrix but for now I think I should try the current

approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed

to

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4,

all

with

length 101. I think my difficulty is my inability to output the

list

detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the S-Plus
SpatialStats module, with "from" and "to" columns, and here drops
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own

neighbours.

It would be relevant to know why you need these, are you looking

at

variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I

did

have

some issues that were resolved by updating packages. I'm using R

3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10 with

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <

Roger.Bivand at nhh.no

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than

yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an

object

of

class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <

Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for each

point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which I

am

trying

to

generate data of each point and their neighbors within

defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

Lom Navanyo

Tue, Jun 9, 2020 1:26 PM #

Thank you very much. I will try as much as I can to see which model best
fits the data.

I have about 3400 wells and about 1500 irrigators.

--------
Lom

On Tue, Jun 9, 2020 at 5:36 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Mon, 8 Jun 2020, Lom Navanyo wrote:

Some farmers own more than one well and thus can extract from their
multiple wells. Others are single well owners.

The amount of water pumped by the irrigator from their wells is the unit

of

observation. And I do not know how it might sound but
I would say "irrigator-well" is the unit of analysis?

Both crops and technology have seasonal patterns, though not pronounced
probably due to switching costs.

I have two segments of the data: A section or a group of neighboring
irrigators pay fees for water withdrawal. The second group (of
neighbors) does not pay any fee aside their individual lift cost (which
is not observed in the data). I do not intend to run a
difference-in-difference model with respect to the fee as that's not
what I want to study. So I intend to run separate models/specifications
for the two groups.

This feels like a linear mixed effects model with an irrigator random
effect and a temporal random effect. A spatial random effect (ICAR?) might
be added, but it will be hard to split the identification of the irrigator
RE from a spatially structured RE for the wells. I think that you should
be looking at the mgcv package, the second edition of Simon Wood's book,
and either an MRF or a Gaussian Process ("gp") spatial RE for the wells.

It may very well be that a group RE (fee/no fee) would discriminate
between the groups statistically, but I'm out of my depth here. Anyway,
mgcv, using a flexible functional form on water level, and RE's for the
other components, seems possible. Structural regression using BayesX or
INLA are also possible. You have 5 years, how many irrigators and how many
wells?

Roger

Thanks,
-----------------
Lom

On Sun, Jun 7, 2020 at 5:06 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

Thank you once again. To clarify, which is more suitable, end of year

water

levels or yearly average measure of water levels?

Also below are a few more notes to throw more light on my

variables/data:

These wells are solely for irrigation purposes and are
irrigator/farmer-owned and operated. No farmer/irrigator moves to
another well not owned by him. The only reason to suspect any spatial
externalities is because the wells share a common aquifer. And this is
essentially what I am testing.

Is irrigation by fixed pipe, or can the water be moved to the area of
another well? Can irrigators extract water from multiple wells? Is then
the irrigator the unit of observation rather than the well?

It is also understood that there are not much variation in the

geography

and geology of the study region.

I have data a number of well specific features in addition to the water
level. I also have some farm data including cropping and technology use
data. No soil data though.
No recharge data too as well.

OK, the farming data may reflect the demand for water. Do the different
crops or technologies have different seasonal patterns, leading to
different draw-down patterns in the wells over time?

In fact, I agree a lot factors can come to play here and I may not have

or

observe all but I was thinking I could incorporate some fixed effects
to take care of those, especially for those I suspect (or perhaps by
theory) are likely to not vary much in terms of their effect on
irrigation(pumping) decisions across farmers
or effect on water level.

My panel is rather a short one: I have a five year panel data.

Given the above, is it still not advisable to use any spatial

econometric

analysis? Just a simple OLS will suffice?

OLS probably not, but the decisions are starting to look like farmers'
cropping decisions, leading to varied need for water. Do the farmers pay
for the water or the irrigation technology?

I'm starting to think that maybe SUR is a possibility, but am unsure how
your short panel would handle that.

Roger

Thanks.
----------------------
Lom





On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Fri, 5 Jun 2020, Lom Navanyo wrote:

I fully agree with you and appreciate the listed benefits of not

taking

things private. I was just trying to be sure the forum here is

appropriate

and receptive of a beginner like me.

To be more explicit with regards to my observations, y is amount of
water withdrawal from wells and an important variable in x is (height
of) water level in the wells. These are end of year figures. I am

using

the aggregations (sum for y and mean for water level) by band as

spatial

neighborhood variables. There will be one or two indicator variables
also in x. I hope these do not present additional hurdles.

There are several further questions. If water level is measured at
end-of-year, it is instantaneous at that point, and will depend on

level a

year earlier plus inflow from the movements of the water table
(precipitation, soils and surface geology, maybe geology if deeper

wells),

minus evaporation (if an open well) and extraction. However, your y
(extraction) is probably measured over an interval (1 Jan - 31 Dec?).

It

does not depend on level unless level is 0, but depends on the

closeness

of people extracting water for domestic, agricultural or other use.

All else equal, you would expect changes in the level in a well to

depend

on inputs, evaporation and extraction, and extraction at that well and
other nearby wells (which may experience falls in the ground water

table

level not because the water was extracted from those wells, but at
neighbouring wells. You may also see users shifting to neerby wells if
their closest well runs dry.

So you probably need to start with a deterministic hydrological model,

and

you need much more information about who extracts and why. Say in

India,

you would also need price data - apparently free water has led to
over-extraction.

So I would advise against any spatial econometric analysis of the data

you

have, because so much is going on in the system as a whole that you

cannot

control if all the data you have is as you describe. I also understand
better why well water level is endogeneous, but am sure that IV will

not

help, since the level is being driven partly by a deterministic
hydrological system which differs from well to well, and extraction

varies

by demand.

Has anyone worked with this kind of data? Any ideas or contributions

more

helpful than the above?

Roger

I am thinking Proximity is relevant in testing spatial
dependency/externality.

I will consider splm package  and the SLX model.

Thank you.
---------------
Lom

On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you. Yes, the OLS is biased and my plan is to use a 2SLS

approach.

have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not

correct

for the endogeneity?

Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you

for

help:

I do not offer private help. That would presuppose that one person

has

the

answer. It would also presuppose that all exchanges are only read by

the

original poster and direct participants, while in fact others may

join

in,

or follow a thread, or find the thread by searching: google supports

the

list:r-sig-geo search tag. If the thread goes private, that search

is

fruitless.

My model actually looks like this: y= f(y, x)  + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x.  Given that I am

seeking

to test spatial dependency, do you see some fatal flaws with my

approach?

What is the support of your observations, point, or are they

aggregations?

Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.

I have also seen other empirical approaches like static and dynamic

spatial

panel data modelling. I will be reviewing them also to see

suitability

for

my objective.
But, any further directions or suggestions are highly appreciated.

If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because

the

instruments are typically at best weak because of shared spatial

processes

with the response, unless the model is really well specified from

known

theory. In space, almost everything is close to endogeneous unless

the

opposite is demonstrated. So causal relationships are less

worthwhile,

because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.

Further, because spatial processes are driven by the inverse matrix

of

the

input graph of proximate neighbours (the covariance matrix of the

spatial

process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that

the

residuals are white noise with no spatial structure.

Recently, analysts prefer to start with the SLX model (Halleck Vega

Elhorst 2015 and others), so that might be worth exploring. If only

the

direct impacts seem important, OLS may be enough.

Hope this helps,

Roger

Thanks,
-------------------
Lom



On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no>

wrote:

On Thu, 4 Jun 2020, Lom Navanyo wrote:

Thank you very much for your support. This gives me what I need

and I

must

say listw2sn() is really great.

Why do I need the data in the format as in dataout? I am trying

to

test

spatial dependence (or neighborhood effect) by running a

regression

model that entails pop_size_it = beta_1*sum of pop_size of point

i's

neighbors within a specified radius. So my plan is to get the

neighbors

for each focal point as per the specified bands and their

attributes

(eg

pop_size) so I can can add them (attribute) by the bands.

Thanks, clarifies a good deal. Maybe look at the original localG

articles

for exploring distance relationships (Getis and Ord looked at

HIV/AIDS);

?spdep::localG or

https://r-spatial.github.io/spdep/reference/localG.html.

Further note at OLS is biased as you have y = f(y) + e, so y on

both

sides. The nearest equivalent for a single band is

spatialreg::lagsarlm()

with listw=nb2listw(wd1, style="B") to get the neighbour sums

through

the

weights matrix. So both your betas and their standard errors are

unusable,

I'm afraid. You are actually very much closer to ordinary kriging,

looking

at the way in which distance attenuates the correlation in value

of

proximate observations.

Hope this clarifies,

Roger

I am totally new to the area of spatial econometrics, so I am

taking

things

one step at a time. Some readings suggest I may need distance

matrix

or

weight matrix but for now I think I should try the current

approach.

Thank you.

-------------
Lom

On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no

wrote:

On Wed, 3 Jun 2020, Lom Navanyo wrote:

I had the errors with rtree using R 3.6.3. I have since changed

to

4.0.0

but I got the same error.

And  yes, for Roger's example, I have the objects wd1, ... wd4,

all

with

length 101. I think my difficulty is my inability to output the

list

detailing the point IDs t50_fid.

library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,

sn_band4))

class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)

The "spatial.neighbour" representation was that used in the

S-Plus

SpatialStats module, with "from" and "to" columns, and here

drops

no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own

neighbours.

It would be relevant to know why you need these, are you looking

at

variogram clouds?

Hope this clarifies,

Roger

---------
Lom

On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <

kent3737 at gmail.com>

wrote:

Roger's example works for me and gives a list of length 101. I

did

have

some issues that were resolved by updating packages. I'm

using R

3.6.3

on

macOS 10.15.4. I also use rtree successfully on Windows 10

with

3.6.3.

Kent

On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <

Roger.Bivand at nhh.no

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

rtree uses Euclidean distance so the points should be in a

coordinate

system where this makes sense at least as a reasonable

approximation.

I tried the original example:

remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])

but unfortunately failed (maybe newer Boost headers than

yours?):

Error in UseMethod("withinDistance", rTree) :
   no applicable method for 'withinDistance' applied to an

object

of

class
"c('list', 'RTree')"

Kent

On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <

Roger.Bivand at nhh.no>

wrote:

On Tue, 2 Jun 2020, Kent Johnson wrote:

Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of

neighboring

        points within specified radii (distances) for

each

point

in a

given

        points data set.

Hello,
I have data set of about 3400 location points with which

am

trying

to

generate data of each point and their neighbors within

defined

radii

(eg,

0.25, 1, and 3 miles).

The rtree package is very fast and memory-efficient for

within-distance

calculations.
https://github.com/hunzikp/rtree

Thanks! Does this also apply when the input points are in

geographical

coordinates?

Roger

Kent Johnson
Cambridge, MA

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo