On Mon, 8 Jun 2020, Lom Navanyo wrote:
Some farmers own more than one well and thus can extract from their
multiple wells. Others are single well owners.
The amount of water pumped by the irrigator from their wells is the unit
observation. And I do not know how it might sound but
I would say "irrigator-well" is the unit of analysis?
Both crops and technology have seasonal patterns, though not pronounced
probably due to switching costs.
I have two segments of the data: A section or a group of neighboring
irrigators pay fees for water withdrawal. The second group (of
neighbors) does not pay any fee aside their individual lift cost (which
is not observed in the data). I do not intend to run a
difference-in-difference model with respect to the fee as that's not
what I want to study. So I intend to run separate models/specifications
for the two groups.
This feels like a linear mixed effects model with an irrigator random
effect and a temporal random effect. A spatial random effect (ICAR?) might
be added, but it will be hard to split the identification of the irrigator
RE from a spatially structured RE for the wells. I think that you should
be looking at the mgcv package, the second edition of Simon Wood's book,
and either an MRF or a Gaussian Process ("gp") spatial RE for the wells.
It may very well be that a group RE (fee/no fee) would discriminate
between the groups statistically, but I'm out of my depth here. Anyway,
mgcv, using a flexible functional form on water level, and RE's for the
other components, seems possible. Structural regression using BayesX or
INLA are also possible. You have 5 years, how many irrigators and how many
wells?
Roger
Thanks,
-----------------
Lom
On Sun, Jun 7, 2020 at 5:06 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Fri, 5 Jun 2020, Lom Navanyo wrote:
Thank you once again. To clarify, which is more suitable, end of year
levels or yearly average measure of water levels?
Also below are a few more notes to throw more light on my
These wells are solely for irrigation purposes and are
irrigator/farmer-owned and operated. No farmer/irrigator moves to
another well not owned by him. The only reason to suspect any spatial
externalities is because the wells share a common aquifer. And this is
essentially what I am testing.
Is irrigation by fixed pipe, or can the water be moved to the area of
another well? Can irrigators extract water from multiple wells? Is then
the irrigator the unit of observation rather than the well?
It is also understood that there are not much variation in the
and geology of the study region.
I have data a number of well specific features in addition to the water
level. I also have some farm data including cropping and technology use
data. No soil data though.
No recharge data too as well.
OK, the farming data may reflect the demand for water. Do the different
crops or technologies have different seasonal patterns, leading to
different draw-down patterns in the wells over time?
In fact, I agree a lot factors can come to play here and I may not have
observe all but I was thinking I could incorporate some fixed effects
to take care of those, especially for those I suspect (or perhaps by
theory) are likely to not vary much in terms of their effect on
irrigation(pumping) decisions across farmers
or effect on water level.
My panel is rather a short one: I have a five year panel data.
Given the above, is it still not advisable to use any spatial
analysis? Just a simple OLS will suffice?
OLS probably not, but the decisions are starting to look like farmers'
cropping decisions, leading to varied need for water. Do the farmers pay
for the water or the irrigation technology?
I'm starting to think that maybe SUR is a possibility, but am unsure how
your short panel would handle that.
Roger
Thanks.
----------------------
Lom
On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand at nhh.no>
On Fri, 5 Jun 2020, Lom Navanyo wrote:
I fully agree with you and appreciate the listed benefits of not
things private. I was just trying to be sure the forum here is
and receptive of a beginner like me.
To be more explicit with regards to my observations, y is amount of
water withdrawal from wells and an important variable in x is (height
of) water level in the wells. These are end of year figures. I am
the aggregations (sum for y and mean for water level) by band as
neighborhood variables. There will be one or two indicator variables
also in x. I hope these do not present additional hurdles.
There are several further questions. If water level is measured at
end-of-year, it is instantaneous at that point, and will depend on
year earlier plus inflow from the movements of the water table
(precipitation, soils and surface geology, maybe geology if deeper
minus evaporation (if an open well) and extraction. However, your y
(extraction) is probably measured over an interval (1 Jan - 31 Dec?).
does not depend on level unless level is 0, but depends on the
of people extracting water for domestic, agricultural or other use.
All else equal, you would expect changes in the level in a well to
on inputs, evaporation and extraction, and extraction at that well and
other nearby wells (which may experience falls in the ground water
level not because the water was extracted from those wells, but at
neighbouring wells. You may also see users shifting to neerby wells if
their closest well runs dry.
So you probably need to start with a deterministic hydrological model,
you need much more information about who extracts and why. Say in
you would also need price data - apparently free water has led to
over-extraction.
So I would advise against any spatial econometric analysis of the data
have, because so much is going on in the system as a whole that you
control if all the data you have is as you describe. I also understand
better why well water level is endogeneous, but am sure that IV will
help, since the level is being driven partly by a deterministic
hydrological system which differs from well to well, and extraction
by demand.
Has anyone worked with this kind of data? Any ideas or contributions
helpful than the above?
Roger
I am thinking Proximity is relevant in testing spatial
dependency/externality.
I will consider splm package and the SLX model.
Thank you.
---------------
Lom
On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand at nhh.no>
On Thu, 4 Jun 2020, Lom Navanyo wrote:
Thank you. Yes, the OLS is biased and my plan is to use a 2SLS
have a variable I intend to use as an IV for y.
I have seen a few papers use this approach. Will this approach not
for the endogeneity?
Actually, I am not sure if this is a right forum or perhaps if it's
appropriate or acceptable to you to take this one-on-one with you
help:
I do not offer private help. That would presuppose that one person
answer. It would also presuppose that all exchanges are only read by
original poster and direct participants, while in fact others may
or follow a thread, or find the thread by searching: google supports
list:r-sig-geo search tag. If the thread goes private, that search
My model actually looks like this: y= f(y, x) + e.
Aside the endogeneity of y (which I intend to instrument by another
variable z), there is simultaneity between y and x.
I intend to use the lag of x as instrument for x. Given that I am
to test spatial dependency, do you see some fatal flaws with my
What is the support of your observations, point, or are they
Why may proximity make a difference - often, apparent spatial
autocorrelation is caused by observing inappropriate entities, or by
omitting covariates, or by using the wrong functional form.
I have also seen other empirical approaches like static and dynamic
panel data modelling. I will be reviewing them also to see
my objective.
But, any further directions or suggestions are highly appreciated.
If the data are spatial panel, you can look at the splm package.
Personally, I have never found instruments any use at all, because
instruments are typically at best weak because of shared spatial
with the response, unless the model is really well specified from
theory. In space, almost everything is close to endogeneous unless
opposite is demonstrated. So causal relationships are less
because they are at best conditional on omitted variables and
autocorrelation engendered by the choice of observational entities.
Further, because spatial processes are driven by the inverse matrix
input graph of proximate neighbours (the covariance matrix of the
process), you don't need to start from more than the first order
neighbours. Maybe your x has the same spatial pattern as y, so that
residuals are white noise with no spatial structure.
Recently, analysts prefer to start with the SLX model (Halleck Vega
Elhorst 2015 and others), so that might be worth exploring. If only
direct impacts seem important, OLS may be enough.
Hope this helps,
Roger
Thanks,
-------------------
Lom
On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand at nhh.no>
On Thu, 4 Jun 2020, Lom Navanyo wrote:
Thank you very much for your support. This gives me what I need
say listw2sn() is really great.
Why do I need the data in the format as in dataout? I am trying
spatial dependence (or neighborhood effect) by running a
model that entails pop_size_it = beta_1*sum of pop_size of point
neighbors within a specified radius. So my plan is to get the
for each focal point as per the specified bands and their
pop_size) so I can can add them (attribute) by the bands.
Thanks, clarifies a good deal. Maybe look at the original localG
for exploring distance relationships (Getis and Ord looked at
Further note at OLS is biased as you have y = f(y) + e, so y on
sides. The nearest equivalent for a single band is
with listw=nb2listw(wd1, style="B") to get the neighbour sums
weights matrix. So both your betas and their standard errors are
I'm afraid. You are actually very much closer to ordinary kriging,
at the way in which distance attenuates the correlation in value
proximate observations.
Hope this clarifies,
Roger
I am totally new to the area of spatial econometrics, so I am
one step at a time. Some readings suggest I may need distance
weight matrix but for now I think I should try the current
Thank you.
-------------
Lom
On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand at nhh.no
On Wed, 3 Jun 2020, Lom Navanyo wrote:
I had the errors with rtree using R 3.6.3. I have since changed
but I got the same error.
And yes, for Roger's example, I have the objects wd1, ... wd4,
length 101. I think my difficulty is my inability to output the
detailing the point IDs t50_fid.
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
pts <- st_coordinates(projdata)
library(spdep)
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
bds <- c(0, bufferR)
wd1 <- dnearneigh(pts, bds[1], bds[2])
wd2 <- dnearneigh(pts, bds[2], bds[3])
wd3 <- dnearneigh(pts, bds[3], bds[4])
wd4 <- dnearneigh(pts, bds[4], bds[5])
sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,
class(data_out) <- "data.frame"
table(data_out$band)
data_out$ID_from <- projdata$t50_fid[data_out$from]
data_out$ID_to <- projdata$t50_fid[data_out$to]
data_out$elev_from <- projdata$elevation[data_out$from]
data_out$elev_to <- projdata$elevation[data_out$to]
str(data_out)
The "spatial.neighbour" representation was that used in the
SpatialStats module, with "from" and "to" columns, and here
no-neighbour cases gracefully. So listw2sn() comes in useful
for creating the output, and from there, just look-up in the
input data.frame. Observations here cannot be their own
It would be relevant to know why you need these, are you looking
variogram clouds?
Hope this clarifies,
Roger
---------
Lom
On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <
Roger's example works for me and gives a list of length 101. I
some issues that were resolved by updating packages. I'm
macOS 10.15.4. I also use rtree successfully on Windows 10
Kent
On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <
On Tue, 2 Jun 2020, Kent Johnson wrote:
rtree uses Euclidean distance so the points should be in a
system where this makes sense at least as a reasonable
I tried the original example:
remotes::install_github("hunzikp/rtree")
library(spData)
library(sf)
projdata<-st_transform(nz_height, 32759)
library(rtree)
pts <- st_coordinates(projdata)
rt <- RTree(st_coordinates(projdata))
bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
wd1 <- withinDistance(rt, pts, bufferR[1])
but unfortunately failed (maybe newer Boost headers than
Error in UseMethod("withinDistance", rTree) :
no applicable method for 'withinDistance' applied to an
class
"c('list', 'RTree')"
Kent
On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <
On Tue, 2 Jun 2020, Kent Johnson wrote:
Date: Tue, 2 Jun 2020 02:44:17 -0500
From: Lom Navanyo <lomnavasia at gmail.com>
To: r-sig-geo at r-project.org
Subject: [R-sig-Geo] How to efficiently generate data of
points within specified radii (distances) for
Hello,
I have data set of about 3400 location points with which
generate data of each point and their neighbors within
The rtree package is very fast and memory-efficient for
Thanks! Does this also apply when the input points are in
Kent Johnson
Cambridge, MA
[[alternative HTML version deleted]]