spdep/splm: k-nearest neighbors, normalizations, listw2U, spatial models, methods

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20120820/849fad75/attachment.pl>

Dear Prof. Bivand and listers:

I started with spatial econometrics and R a few months ago and this is 
the first time I write here. My apologies if I miss something.

In am a bit confused about the final weight matrix I am using with 
spdep. Initially I start with an inverse distance matrix to the 
5-nearest neighbors, so asymmetric general weights. I decided this 
matrix as my baseline because of several reasons: I research about 
distance, Griffith (1996) rules of thumb, inspection of the plots of 
links...

I build the listw style W (row-normalization) to apply errorsarlm or 
lagsarlm. But at least for method eigen, theses functions use the 
function listw2U which changes my listw.
Good that you realise that you are confused! Your conclusion is wrong. If 
(and only if) the listw object can be Ord-transformed to symmetry (an 
underlying symmetric set of neighbours and weighting scheme "W" or "S"), 
the Ord transformation is applies, and possible numerical (< 1e-16 
differences across the diagonal) fuzz removed with listw2U(). If the 
underlying neighbours are asymmetric, the can.sim variable in the model 
fitting functions is FALSE, so the eigenvalues are extracted from the 
asymmetric matrix. You missed the if() statement in the code in 
eigen_setup() (in jacobian.R), documented in ?eigen_setup
I will try to write separate questions or sentences about my doubts, 
though they are related:

Q1 - I understand that the similar.listw function 
(http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=spdep:similar.listw) is 
the same than listw2U but I am not sure.
No, similar.listw() does the Ord transformation, listw2U does (W + W')*0.5
Q2 - I see that listw2U uses the function make.sym.nb, adding the 
neighbors that were asymmetric. If this is true, I miss the property of 
asymmetry, so I do not see the point of starting with k-nearest 
neighbors.
The assumption is that people know what they are doing, so if the user 
wants to impose symmetry, this is a possible choice. The function is not 
used as much as you seem to think.
Q3 - Similarly, I understand that these functions are applying Ord 
(1975) transformation, though the notation here confuses me: 
http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=spdep:lm.morantest The 
helper function listw2U() constructs a weights list object corresponding 
to the sparse matrix 1/2 (W + W').
Some derivations of Moran tests are based on symmetry, and most often 
cross-products of W are involved, making the point moot (see Cliff & Ord 
1973).
Q4 - The help in moran.test explains that for inherently non-symmetric 
matrices, such as k-nearest neighbour matrices, listw2U() can be used to 
make the matrix symmetric. Does this means that I should pass my W 
matrix through listw2U before Moran test?. In row-normalizatization W 
style or in B style with inverse distance weights?
I'm travelling and cannot answer more now. I think that you have confused 
youself more than necessary. There is no problem for model fitting, and as 
far as I am aware, no problem for tests. Please provide worked examples 
showing that the current code leads to results that you can show are 
incorrect (for example giving different results from OpenGeoDa, PySAL, 
Matlab Spatial Econometrics toolbox). Simplify your question to one point, 
not many as now.

Hope this clarifies,

Roger
Q5 - In Elhorst (2010) (Fisher & Getis book, page 380) I see that Ord 
transformation keeps unchanged the mutual proportions between the 
elemets of W, which is relevant for inverse distances (Anselin 1998, 
23-24). I understand that if I want to keep the interpretation of 
inverse distances I should apply this transformation to the listw style 
B, without row-normalization, with my general spatial weights of inverse 
distances. But the method eigen in, for instance, lagsarlm or ASDAR page 
284 says that Ord normalization can be used just with W matrixes. I have 
tried this method to estimate spatial models with my listw in both B and 
W styles, but I am not sure what I am doing at the end in each case.

Q7 - Similarly to standardization, the fact of making symmetric the matrix, loses the geographical interpretation. The unit A can be the nearest neighbors of B but not the contrary so their main links would not be reciprocal. I am not sure about in which ways the issue of symmetry in the nb list is a different issue from normalization of weights. Both things usually are discussed at the same time, but they are conceptually different

Q8 - If it is a problem of estimation, I can use other methods to keep my matrix asymmetric and/or with no-standardized weights. Which one would be better?.

Q9 - Any idea about how the splm packages deals with some of the previous issues?

Q10 - This question is more general. I am still not sure if I understand well the reasons for normalization. Maybe it is to ensure the estimation by keeping the eigen values in the right range. Right now I do not care about the easier interpretation of the parameters after row-normalization. My question is if normalization helps to correct spatial autocorrelation. If so, why does it helps if it, at least with row-normalization, loses the main information about the absolute distance to each neighbor?. Why the impact of each unit by all other unit should be equalized?. Is each unit equally open, equally close to the other ones?. Or this is just to avoid imposing structure on the spatial dependence so general spatial weights are not that useful (just relative distances)?

Any thinking about these doubts would be very appreciated. Thank you very much

Fernando

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

On Mon, 20 Aug 2012, Fernando Bruna Quintas wrote:

Dear Prof. Bivand and listers:

I started with spatial econometrics and R a few months ago and this is the 
first time I write here. My apologies if I miss something.

In am a bit confused about the final weight matrix I am using with spdep. 
Initially I start with an inverse distance matrix to the 5-nearest 
neighbors, so asymmetric general weights. I decided this matrix as my 
baseline because of several reasons: I research about distance, Griffith 
(1996) rules of thumb, inspection of the plots of links...

I build the listw style W (row-normalization) to apply errorsarlm or 
lagsarlm. But at least for method eigen, theses functions use the function 
listw2U which changes my listw.
I can confirm that row-standardised k=5 asymmetric weights give the same 
regression coefficients in lagsarlm() and Stata's spreg ml. Nothing 
untoward is going on, and no changes are being made to the weights.
Good that you realise that you are confused! Your conclusion is wrong. If 
(and only if) the listw object can be Ord-transformed to symmetry (an 
underlying symmetric set of neighbours and weighting scheme "W" or "S"), the 
Ord transformation is applies, and possible numerical (< 1e-16 differences 
across the diagonal) fuzz removed with listw2U(). If the underlying 
neighbours are asymmetric, the can.sim variable in the model fitting 
functions is FALSE, so the eigenvalues are extracted from the asymmetric 
matrix. You missed the if() statement in the code in eigen_setup() (in 
jacobian.R), documented in ?eigen_setup

I will try to write separate questions or sentences about my doubts, though 
they are related:

Q1 - I understand that the similar.listw function 
(http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=spdep:similar.listw) is the 
same than listw2U but I am not sure.
No, similar.listw() does the Ord transformation, listw2U does (W + W')*0.5

Q2 - I see that listw2U uses the function make.sym.nb, adding the neighbors 
that were asymmetric. If this is true, I miss the property of asymmetry, so 
I do not see the point of starting with k-nearest neighbors.
The assumption is that people know what they are doing, so if the user wants 
to impose symmetry, this is a possible choice. The function is not used as 
much as you seem to think.
Running listw2U() and 0.5*(W + t(W)) are fully equivalent for asymmetric 
neighbours:

set.seed(1) # or your choice of seed
res <- logical(500)
for (i in seq(along=res)) {
   nb5 <- knn2nb(knearneigh(cbind(runif(100), runif(100)), k=5))
   lw5 <- nb2listw(nb5, style="W")
   m5 <- listw2mat(lw5)
   MU5 <- 0.5*(m5 + t(m5))
   lwU5 <- listw2U(lw5)
   MUU5 <- listw2mat(lwU5)
   res[i] <- all.equal(MU5, MUU5)
}
table(res)

make.sym.nb() is used to find the "missing" cross-diagonal entries.

Q3 - Similarly, I understand that these functions are applying Ord (1975) 
transformation, though the notation here confuses me: 
http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=spdep:lm.morantest The helper 
function listw2U() constructs a weights list object corresponding to the 
sparse matrix 1/2 (W + W').
Some derivations of Moran tests are based on symmetry, and most often 
cross-products of W are involved, making the point moot (see Cliff & Ord 
1973).

The output of lm.morantest(), and its equivalents in PySAL and OpenGeoDa 
are identical for k=5 asymmetric weights.
Q4 - The help in moran.test explains that for inherently non-symmetric 
matrices, such as k-nearest neighbour matrices, listw2U() can be used to 
make the matrix symmetric. Does this means that I should pass my W matrix 
through listw2U before Moran test?. In row-normalizatization W style or in 
B style with inverse distance weights?
I'm travelling and cannot answer more now. I think that you have confused 
youself more than necessary. There is no problem for model fitting, and as 
far as I am aware, no problem for tests. Please provide worked examples 
showing that the current code leads to results that you can show are 
incorrect (for example giving different results from OpenGeoDa, PySAL, Matlab 
Spatial Econometrics toolbox). Simplify your question to one point, not many 
as now.

Hope this clarifies,

Roger

Q5 - In Elhorst (2010) (Fisher & Getis book, page 380) I see that Ord 
transformation keeps unchanged the mutual proportions between the elemets 
of W, which is relevant for inverse distances (Anselin 1998, 23-24). I 
understand that if I want to keep the interpretation of inverse distances I 
should apply this transformation to the listw style B, without 
row-normalization, with my general spatial weights of inverse distances. 
But the method eigen in, for instance, lagsarlm or ASDAR page 284 says that 
Ord normalization can be used just with W matrixes. I have tried this 
method to estimate spatial models with my listw in both B and W styles, but 
I am not sure what I am doing at the end in each case.
Ord normalization can be used on W and S style, for underlying symmetric 
weights. If the underlying weights are not symmetric, it cannot be used, 
and the eigenvalues will be complex. For comparison use the LU method, 
which can also handle intrinsically asymmetric weights.

Roger
Q7 - Similarly to standardization, the fact of making symmetric the matrix, 
loses the geographical interpretation. The unit A can be the nearest 
neighbors of B but not the contrary so their main links would not be 
reciprocal. I am not sure about in which ways the issue of symmetry in the 
nb list is a different issue from normalization of weights. Both things 
usually are discussed at the same time, but they are conceptually different

Q8 - If it is a problem of estimation, I can use other methods to keep my 
matrix asymmetric and/or with no-standardized weights. Which one would be 
better?.

Q9 - Any idea about how the splm packages deals with some of the previous 
issues?

Q10 - This question is more general. I am still not sure if I understand 
well the reasons for normalization. Maybe it is to ensure the estimation by 
keeping the eigen values in the right range. Right now I do not care about 
the easier interpretation of the parameters after row-normalization. My 
question is if normalization helps to correct spatial autocorrelation. If 
so, why does it helps if it, at least with row-normalization, loses the 
main information about the absolute distance to each neighbor?. Why the 
impact of each unit by all other unit should be equalized?. Is each unit 
equally open, equally close to the other ones?. Or this is just to avoid 
imposing structure on the spatial dependence so general spatial weights are 
not that useful (just relative distances)?

Any thinking about these doubts would be very appreciated. Thank you very 
much

Fernando

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
Thank you very much for the detailed answer. My apologizes for asking so many things. My problem was not one of comparing results from different software but of understanding what I was doing. I was just getting crazy and with too many questions. Now I see some light. Let me to summarize just to check if I am understanding right (apologies for the extension, there were many issues involved in the same problem).

Now I see that I have to test for autocorrelation with a symmetric matrix, so maybe it is more reasonable to use that same symmetric matrix for everything and later compare spatial models with the LU method and my original asymmetric matrix. list2W does not do the job of symmetrizing in my case because with general weights the true inverse distance of asymmetric links is divided by two. So I have to use make.sym.nb or to start with a symmetric approach of neighbourhood.

On the other hand I had a confusion (fatigue and haste!) about when the word "symmetry" is used for nb objects or for weights. My second problem was one keeping the economic interpretation of the inverse distances, style B. With a symmetric nb object, the inverse distance weights are symmetric too, and they get asymmetric with row-standardization, which loses the absolute value of the distances. Trying to understand this, I paid attention to this sentence by Elhorst (2010) in Fisher and Getis Hadbook, page 380:

"If  W0 denotes the spatial weights matrix BEFORE NORMALIZATION, one may (...) normalize  W0 by  W=D^(1/2)?W0?D^(1/2), where D is a diagonal matrix containing the row sums of W0. (This) operation has been proposed by Ord (1975) and has the effect that the  characteristic roots of  W are identical to the characteristic roots of a row-normalized  W0. Importantly, THE MUTUAL PROPORTIONS BETWEEN THE ELEMENTS OF W remain unchanged as a result of these two alternative normalizations. This is an important property when W represents an inverse distance matrix, since scaling the rows or columns of an inverse distance matrix so that the weights sum to one would cause this matrix to lose its economic interpretation for this decay (Anselin 1988, pp. 23-24)."

But Ord(1975, 125) speaks of a symmetric unstandardized W0 and standardized it with DW0. So really it is:  W=D^(1/2)?(DW0)?D^(1/2). That is what similar.listw does and  you explain in ASDAR page 284.

So now I realize that my trouble was with Elhorst sentence, not with spdep explanations. I do not see that Ord-transformation would keep the mutual proportions between the elements of a B style inverse distance weighting matrix ("before normalization"). I did a few checks about the mutual proportions (I send to you if you want) and they change. I misunderstood what Elhorst seem to say. Ords transformation seems to work for W-style weights of a symmetric matrix, so the mutual proportions of elements in the B-style (absolute inverse distances) get lost.

Now I see that the spatial models use Ord-transformation if and only if the listw object can be Ord-transformed to symmetry. So my original listw (row-standardized inverse distance 5-nearest neighbours) was left untouched. Ord-transformation will be applied if now I use the row-standardized symmetrized (nb) version or a symilar matrix with symmetric neighbours.

But if I want to keep the economic interpretation of the weights I should use the B version. I made a few checks (I send you all you want). With the default method I get those weird spatial parameters (huge or negative, which makes no sense in my data). I understand that probably this is because of the eigen-value calculations. It is solved with the LU method. But I am not able to correct for the autocorrelation, contrary to the case when I use the row-standardized forms. And I get much higher AIC.

So, even if all the previous was not completely exact, now I understand better what I am doing. My last doubt remains. It is a more philosophical issue about using row-standardization to correct for spatial autocorrelation. I have a problem with the "economic interpretation" of row-standardizationt, as Anselin-Elhorst were commenting. But I understand that it is too general to ask here.

Thank you again, you helped me a lot.

Fernando Bruna

----- Mensaje original -----
De: "Roger Bivand" <Roger.Bivand at nhh.no>
Para: "Fernando Bruna Quintas" <f.bruna at udc.es>
CC: r-sig-geo at r-project.org
Enviados: Martes, 21 de Agosto 2012 23:30:59
Asunto: Re: [R-sig-Geo] spdep/splm: k-nearest neighbors, normalizations, listw2U, spatial models, methods

On Mon, 20 Aug 2012, Fernando Bruna Quintas wrote:

Dear Prof. Bivand and listers:

I started with spatial econometrics and R a few months ago and this is the 
first time I write here. My apologies if I miss something.

In am a bit confused about the final weight matrix I am using with spdep. 
Initially I start with an inverse distance matrix to the 5-nearest 
neighbors, so asymmetric general weights. I decided this matrix as my 
baseline because of several reasons: I research about distance, Griffith 
(1996) rules of thumb, inspection of the plots of links...

I build the listw style W (row-normalization) to apply errorsarlm or 
lagsarlm. But at least for method eigen, theses functions use the function 
listw2U which changes my listw.
I can confirm that row-standardised k=5 asymmetric weights give the same 
regression coefficients in lagsarlm() and Stata's spreg ml. Nothing 
untoward is going on, and no changes are being made to the weights.
Good that you realise that you are confused! Your conclusion is wrong. If 
(and only if) the listw object can be Ord-transformed to symmetry (an 
underlying symmetric set of neighbours and weighting scheme "W" or "S"), the 
Ord transformation is applies, and possible numerical (< 1e-16 differences 
across the diagonal) fuzz removed with listw2U(). If the underlying 
neighbours are asymmetric, the can.sim variable in the model fitting 
functions is FALSE, so the eigenvalues are extracted from the asymmetric 
matrix. You missed the if() statement in the code in eigen_setup() (in 
jacobian.R), documented in ?eigen_setup

I will try to write separate questions or sentences about my doubts, though 
they are related:

Q1 - I understand that the similar.listw function 
(http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=spdep:similar.listw) is the 
same than listw2U but I am not sure.
No, similar.listw() does the Ord transformation, listw2U does (W + W')*0.5

Q2 - I see that listw2U uses the function make.sym.nb, adding the neighbors 
that were asymmetric. If this is true, I miss the property of asymmetry, so 
I do not see the point of starting with k-nearest neighbors.
The assumption is that people know what they are doing, so if the user wants 
to impose symmetry, this is a possible choice. The function is not used as 
much as you seem to think.
Running listw2U() and 0.5*(W + t(W)) are fully equivalent for asymmetric 
neighbours:

set.seed(1) # or your choice of seed
res <- logical(500)
for (i in seq(along=res)) {
   nb5 <- knn2nb(knearneigh(cbind(runif(100), runif(100)), k=5))
   lw5 <- nb2listw(nb5, style="W")
   m5 <- listw2mat(lw5)
   MU5 <- 0.5*(m5 + t(m5))
   lwU5 <- listw2U(lw5)
   MUU5 <- listw2mat(lwU5)
   res[i] <- all.equal(MU5, MUU5)
}
table(res)

make.sym.nb() is used to find the "missing" cross-diagonal entries.

Q3 - Similarly, I understand that these functions are applying Ord (1975) 
transformation, though the notation here confuses me: 
http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=spdep:lm.morantest The helper 
function listw2U() constructs a weights list object corresponding to the 
sparse matrix 1/2 (W + W').
Some derivations of Moran tests are based on symmetry, and most often 
cross-products of W are involved, making the point moot (see Cliff & Ord 
1973).

The output of lm.morantest(), and its equivalents in PySAL and OpenGeoDa 
are identical for k=5 asymmetric weights.
Q4 - The help in moran.test explains that for inherently non-symmetric 
matrices, such as k-nearest neighbour matrices, listw2U() can be used to 
make the matrix symmetric. Does this means that I should pass my W matrix 
through listw2U before Moran test?. In row-normalizatization W style or in 
B style with inverse distance weights?
I'm travelling and cannot answer more now. I think that you have confused 
youself more than necessary. There is no problem for model fitting, and as 
far as I am aware, no problem for tests. Please provide worked examples 
showing that the current code leads to results that you can show are 
incorrect (for example giving different results from OpenGeoDa, PySAL, Matlab 
Spatial Econometrics toolbox). Simplify your question to one point, not many 
as now.

Hope this clarifies,

Roger

Q5 - In Elhorst (2010) (Fisher & Getis book, page 380) I see that Ord 
transformation keeps unchanged the mutual proportions between the elemets 
of W, which is relevant for inverse distances (Anselin 1998, 23-24). I 
understand that if I want to keep the interpretation of inverse distances I 
should apply this transformation to the listw style B, without 
row-normalization, with my general spatial weights of inverse distances. 
But the method eigen in, for instance, lagsarlm or ASDAR page 284 says that 
Ord normalization can be used just with W matrixes. I have tried this 
method to estimate spatial models with my listw in both B and W styles, but 
I am not sure what I am doing at the end in each case.
Ord normalization can be used on W and S style, for underlying symmetric 
weights. If the underlying weights are not symmetric, it cannot be used, 
and the eigenvalues will be complex. For comparison use the LU method, 
which can also handle intrinsically asymmetric weights.

Roger
Q7 - Similarly to standardization, the fact of making symmetric the matrix, 
loses the geographical interpretation. The unit A can be the nearest 
neighbors of B but not the contrary so their main links would not be 
reciprocal. I am not sure about in which ways the issue of symmetry in the 
nb list is a different issue from normalization of weights. Both things 
usually are discussed at the same time, but they are conceptually different

Q8 - If it is a problem of estimation, I can use other methods to keep my 
matrix asymmetric and/or with no-standardized weights. Which one would be 
better?.

Q9 - Any idea about how the splm packages deals with some of the previous 
issues?

Q10 - This question is more general. I am still not sure if I understand 
well the reasons for normalization. Maybe it is to ensure the estimation by 
keeping the eigen values in the right range. Right now I do not care about 
the easier interpretation of the parameters after row-normalization. My 
question is if normalization helps to correct spatial autocorrelation. If 
so, why does it helps if it, at least with row-normalization, loses the 
main information about the absolute distance to each neighbor?. Why the 
impact of each unit by all other unit should be equalized?. Is each unit 
equally open, equally close to the other ones?. Or this is just to avoid 
imposing structure on the spatial dependence so general spatial weights are 
not that useful (just relative distances)?

Any thinking about these doubts would be very appreciated. Thank you very 
much

Fernando

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no