How to efficiently generate data of neighboring points
Thank you once again. To clarify, which is more suitable, end of year water levels or yearly average measure of water levels? Also below are a few more notes to throw more light on my variables/data: These wells are solely for irrigation purposes and are irrigator/farmer-owned and operated. No farmer/irrigator moves to another well not owned by him. The only reason to suspect any spatial externalities is because the wells share a common aquifer. And this is essentially what I am testing. It is also understood that there are not much variation in the geography and geology of the study region. I have data a number of well specific features in addition to the water level. I also have some farm data including cropping and technology use data. No soil data though. No recharge data too as well. In fact, I agree a lot factors can come to play here and I may not have or observe all but I was thinking I could incorporate some fixed effects to take care of those, especially for those I suspect (or perhaps by theory) are likely to not vary much in terms of their effect on irrigation(pumping) decisions across farmers or effect on water level. My panel is rather a short one: I have a five year panel data. Given the above, is it still not advisable to use any spatial econometric analysis? Just a simple OLS will suffice? Thanks. ---------------------- Lom
On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Fri, 5 Jun 2020, Lom Navanyo wrote:
I fully agree with you and appreciate the listed benefits of not taking things private. I was just trying to be sure the forum here is
appropriate
and receptive of a beginner like me. To be more explicit with regards to my observations, y is amount of water withdrawal from wells and an important variable in x is (height of) water level in the wells. These are end of year figures. I am using the aggregations (sum for y and mean for water level) by band as spatial neighborhood variables. There will be one or two indicator variables also in x. I hope these do not present additional hurdles.
There are several further questions. If water level is measured at end-of-year, it is instantaneous at that point, and will depend on level a year earlier plus inflow from the movements of the water table (precipitation, soils and surface geology, maybe geology if deeper wells), minus evaporation (if an open well) and extraction. However, your y (extraction) is probably measured over an interval (1 Jan - 31 Dec?). It does not depend on level unless level is 0, but depends on the closeness of people extracting water for domestic, agricultural or other use. All else equal, you would expect changes in the level in a well to depend on inputs, evaporation and extraction, and extraction at that well and other nearby wells (which may experience falls in the ground water table level not because the water was extracted from those wells, but at neighbouring wells. You may also see users shifting to neerby wells if their closest well runs dry. So you probably need to start with a deterministic hydrological model, and you need much more information about who extracts and why. Say in India, you would also need price data - apparently free water has led to over-extraction. So I would advise against any spatial econometric analysis of the data you have, because so much is going on in the system as a whole that you cannot control if all the data you have is as you describe. I also understand better why well water level is endogeneous, but am sure that IV will not help, since the level is being driven partly by a deterministic hydrological system which differs from well to well, and extraction varies by demand. Has anyone worked with this kind of data? Any ideas or contributions more helpful than the above? Roger
I am thinking Proximity is relevant in testing spatial dependency/externality. I will consider splm package and the SLX model. Thank you. --------------- Lom On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Thu, 4 Jun 2020, Lom Navanyo wrote:
Thank you. Yes, the OLS is biased and my plan is to use a 2SLS
approach.
I
have a variable I intend to use as an IV for y. I have seen a few papers use this approach. Will this approach not
correct
for the endogeneity? Actually, I am not sure if this is a right forum or perhaps if it's appropriate or acceptable to you to take this one-on-one with you for
help: I do not offer private help. That would presuppose that one person has
the
answer. It would also presuppose that all exchanges are only read by the original poster and direct participants, while in fact others may join
in,
or follow a thread, or find the thread by searching: google supports the list:r-sig-geo search tag. If the thread goes private, that search is fruitless.
My model actually looks like this: y= f(y, x) + e. Aside the endogeneity of y (which I intend to instrument by another variable z), there is simultaneity between y and x. I intend to use the lag of x as instrument for x. Given that I am
seeking
to test spatial dependency, do you see some fatal flaws with my
approach?
What is the support of your observations, point, or are they
aggregations?
Why may proximity make a difference - often, apparent spatial autocorrelation is caused by observing inappropriate entities, or by omitting covariates, or by using the wrong functional form.
I have also seen other empirical approaches like static and dynamic
spatial
panel data modelling. I will be reviewing them also to see suitability
for
my objective. But, any further directions or suggestions are highly appreciated.
If the data are spatial panel, you can look at the splm package. Personally, I have never found instruments any use at all, because the instruments are typically at best weak because of shared spatial
processes
with the response, unless the model is really well specified from known theory. In space, almost everything is close to endogeneous unless the opposite is demonstrated. So causal relationships are less worthwhile, because they are at best conditional on omitted variables and autocorrelation engendered by the choice of observational entities. Further, because spatial processes are driven by the inverse matrix of
the
input graph of proximate neighbours (the covariance matrix of the
spatial
process), you don't need to start from more than the first order neighbours. Maybe your x has the same spatial pattern as y, so that the residuals are white noise with no spatial structure. Recently, analysts prefer to start with the SLX model (Halleck Vega & Elhorst 2015 and others), so that might be worth exploring. If only the direct impacts seem important, OLS may be enough. Hope this helps, Roger
Thanks, ------------------- Lom On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no>
wrote:
On Thu, 4 Jun 2020, Lom Navanyo wrote:
Thank you very much for your support. This gives me what I need and I
must
say listw2sn() is really great. Why do I need the data in the format as in dataout? I am trying to
test
spatial dependence (or neighborhood effect) by running a regression model that entails pop_size_it = beta_1*sum of pop_size of point i's neighbors within a specified radius. So my plan is to get the
neighbors
for each focal point as per the specified bands and their attributes
(eg
pop_size) so I can can add them (attribute) by the bands.
Thanks, clarifies a good deal. Maybe look at the original localG
articles
for exploring distance relationships (Getis and Ord looked at
HIV/AIDS);
?spdep::localG or
Further note at OLS is biased as you have y = f(y) + e, so y on both sides. The nearest equivalent for a single band is
spatialreg::lagsarlm()
with listw=nb2listw(wd1, style="B") to get the neighbour sums through
the
weights matrix. So both your betas and their standard errors are
unusable,
I'm afraid. You are actually very much closer to ordinary kriging,
looking
at the way in which distance attenuates the correlation in value of proximate observations. Hope this clarifies, Roger
I am totally new to the area of spatial econometrics, so I am taking
things
one step at a time. Some readings suggest I may need distance matrix
or
weight matrix but for now I think I should try the current approach. Thank you. ------------- Lom On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no>
wrote:
On Wed, 3 Jun 2020, Lom Navanyo wrote:
I had the errors with rtree using R 3.6.3. I have since changed to
R
4.0.0
but I got the same error. And yes, for Roger's example, I have the objects wd1, ... wd4, all
with
length 101. I think my difficulty is my inability to output the
list
detailing the point IDs t50_fid.
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,
sn_band4))
class(data_out) <- "data.frame" table(data_out$band) data_out$ID_from <- projdata$t50_fid[data_out$from] data_out$ID_to <- projdata$t50_fid[data_out$to] data_out$elev_from <- projdata$elevation[data_out$from] data_out$elev_to <- projdata$elevation[data_out$to] str(data_out) The "spatial.neighbour" representation was that used in the S-Plus SpatialStats module, with "from" and "to" columns, and here drops no-neighbour cases gracefully. So listw2sn() comes in useful for creating the output, and from there, just look-up in the input data.frame. Observations here cannot be their own neighbours. It would be relevant to know why you need these, are you looking at variogram clouds? Hope this clarifies, Roger
--------- Lom On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 at gmail.com>
wrote:
Roger's example works for me and gives a list of length 101. I did
have
some issues that were resolved by updating packages. I'm using R
3.6.3
on
macOS 10.15.4. I also use rtree successfully on Windows 10 with R
3.6.3.
Kent On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand at nhh.no
wrote:
On Tue, 2 Jun 2020, Kent Johnson wrote:
rtree uses Euclidean distance so the points should be in a
coordinate
system where this makes sense at least as a reasonable
approximation.
I tried the original example:
remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])
but unfortunately failed (maybe newer Boost headers than yours?):
Error in UseMethod("withinDistance", rTree) :
no applicable method for 'withinDistance' applied to an object
of
class
"c('list', 'RTree')"
Kent On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <
Roger.Bivand at nhh.no>
wrote:
On Tue, 2 Jun 2020, Kent Johnson wrote:
Date: Tue, 2 Jun 2020 02:44:17 -0500 From: Lom Navanyo <lomnavasia at gmail.com> To: r-sig-geo at r-project.org Subject: [R-sig-Geo] How to efficiently generate data of
neighboring
points within specified radii (distances) for each
point
in a
given
points data set.
Hello, I have data set of about 3400 location points with which I am
trying
to
generate data of each point and their neighbors within
defined
radii
(eg,
0.25, 1, and 3 miles).
The rtree package is very fast and memory-efficient for
within-distance
calculations. https://github.com/hunzikp/rtree
Thanks! Does this also apply when the input points are in
geographical
coordinates? Roger
Kent Johnson
Cambridge, MA
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en