An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20101219/547bf34e/attachment.pl>
contradicting measures based on Log likelihood and AIC in spatial models
4 messages · elaine kuo, Roger Bivand
On Sun, 19 Dec 2010, elaine kuo wrote:
Dear List, I am running generalized linear models considering spatial autocorrelation. (Moran? s I = 0.52)
Was this using lm.morantest() - it should have been?
(sample size 4873, explanatory variable number: 6) After trying SAR and CAR in package spdep, the results are as followed.
Neither of the fits are credible, as the line search has terminated in both cases at its upper limit. These is something seriously wrong with your analysis. Which method= argument were you using, presumably in spautolm? You did not quote the exact way in which you called the functions used - this will be where your problems lie. Are you using a weights matrix with very small values (and very small row sums), and a sparse matrix method? Do you get different outcomes if you set the search interval yourself - or accept the default and row-standardise the weights (style="W" in nb2listw)? Look at summary(sapply(listw$weights, sum)) where listw is your listw object. I'll add a warning to spautolm() to test whether the result is equal to a line search bound. In reply to your question, had your analysis not been flawed, you could not judge between CAR and SAR in the way you suggest as the models are not strictly nested. Please see Ripley (1981) Spatial Statistics, Wiley, p. 90, for a discussion of the relationship between SAR and CAR representations. Briefly, if we term the CAR weights matrix C and the SAR weights matrix S, then C = S + S' - S'S, and if LL' is the Cholesky decomposition of (I - C), S = I - L'. If you do the full analysis of this relationship, you may be able to proceed, but you also need to consider the interpretation of any results. Hope this helps, Roger
I would like to learn which model was better fit. However, the measures based on log likelihood and AIC imply different contradictions. 1. log likehood SAR is better than CAR (4919,629 > 3694.246) 2. AIC CAR is better than SAR (-7370.5 > -9821.3) Please kindly instruct which criterion I should follow, and advice on any other measure will be highly appreciated. Elaine ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SAR Lambda: 0.999 LR test value: 13618 p-value: < 2.22e-16 Log likelihood: 4919.629 ML residual variance (sigma squared): 0.0075669, (sigma: 0.086988) Number of observations: 4873 Number of parameters estimated: 9 AIC: -9821.3 Nagelkerke pseudo-R-squared: 1 CAR Lambda: 0.999 LR test value: 11167 p-value: < 2.22e-16 Log likelihood: 3694.246 ML residual variance (sigma squared): 0.012682, (sigma: 0.11261) Number of observations: 4873 Number of parameters estimated: 9 AIC: -7370.5 Nagelkerke pseudo-R-squared: 1
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20101220/b910815b/attachment.pl>
On Mon, 20 Dec 2010, elaine kuo wrote:
Dear Dr. Bivand, Thank you for the clear explanation. Your guess is right, and the missing code is pasted following each question. I will check the book you recommended and please advise for improvement of the code below.
Elaine, There are a number of issues with your code; you have been ignoring a warning after running the CAR model too - the function should have reported: Non-symmetric spatial weights in CAR model which are not permitted, but may be valid if case weights are used too. You are also using na.omit, which suggests that your data is not complete - this may introduce additional problems. I'm concerned that your distance based weights may also be incorrectly specified, as you are using distances on the plane, but I suspect that you should be using Great Circle distances. Since this is a use case in which the spatial coefficients are hitting their upper bounds, I would be interested in seeing your data and script, if you can make then available, just in case there is something that ought to be fixed in the function. Could you please either attach Mig_ratio_20101214.csv to an offlist email reply, or if it is large, put it on a website and send the URL? Best wishes, Roger
Thanks Elaine
(Moran? s I = 0.52)
Was this using lm.morantest() - it should have been?
=> Yes
# code
mig.moran<-lm.morantest(mig.std,nb8.w)
print(mig.moran)
(sample size 4873, explanatory variable number: 6)
After trying SAR and CAR in package spdep, the results are as followed.
Neither of the fits are credible, as the line search has terminated in both cases at its upper limit. These is something seriously wrong with your analysis. Which method= argument were you using, presumably in spautolm? You did not quote the exact way in which you called the functions used - this will be where your problems lie.
=> Yes, spautolm was used.
summary(sapply(listw$weights, sum)
=> It shows
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 1 1 1 1 1
# code
datam <-read.csv("c:/migration/Mig_ratio_20101214.csv",header=T,
row.names=1)
dim(datam)
library(ncf)
library(spdep)
# get the upper bound
up <- knearneigh(cbind(datam$lon,datam$lat))
upknn <- knn2nb(up)
updist1 <- nbdists(upknn,cbind(datam$lon,datam$lat))
updist1
updistvec <- unlist(updist1)
updistvec
upmaxd <- max(updistvec)
upmaxd #the upper bound (8.1124)
#******************************************************************************************
# spatial listw object
# Define coordinates, neighbours, and spatial weights
coords<-cbind(datam$lon,datam$lat)
coords<-as.matrix(coords)
# Define neighbourhood (here distance 8)
nb8<-dnearneigh(coords,0,8.12)
summary(nb8)
#length(nb8)
#sum(card(nb8))
# Spatial weights, illustrated with coding style "W" (row standardized)
nb8.w<-nb2listw(nb8, glist=NULL, style="W")
# regression
# std model
datam.sd<-scale(datam)
datam.std<-as.data.frame(datam.sd)
summary (datam.std)
mean(datam.std)
# obtain standard deviation
sd(datam.std)
# model
mig.std <-lm(SMB~t_r+e_r+p_e+a_r+
coast+Iso_1214,data=datam.std)
summary(mig.std)
# SAR of spautolm
mignb8.sar <- spautolm(SMB~t_r+e_r+p_e+a_r+
coast+Iso_1214,data=datam.std,
listw=nb8.w, na.action=na.omit, zero.policy=TRUE,
family="SAR", method="Matrix", verbose=TRUE)
summary(mignb8.sar,Nagelkerke = TRUE)
# CAR of spautolm
mignb8.car <- spautolm(SMB~t_r+e_r+p_e+a_r+
coast+Iso_1214,data=datam.std,
listw=nb8.w, na.action=na.omit, , zero.policy=TRUE,
family="CAR", method="Matrix", verbose=TRUE)
summary(mignb8.car,Nagelkerke = TRUE)
Are you using a weights matrix with very small values (and very small row sums), and a sparse matrix method? Do you get different outcomes if you set the search interval yourself - or accept the default and row-standardise the weights (style="W" in nb2listw)? Look at summary(sapply(listw$weights, sum)) where listw is your listw object. I'll add a warning to spautolm() to test whether the result is equal to a line search bound.
In reply to your question, had your analysis not been flawed, you could not judge between CAR and SAR in the way you suggest as the models are not strictly nested. Please see Ripley (1981) Spatial Statistics, Wiley, p. 90, for a discussion of the relationship between SAR and CAR representations. Briefly, if we term the CAR weights matrix C and the SAR weights matrix S, then C = S + S' - S'S, and if LL' is the Cholesky decomposition of (I - C), S = I - L'. If you do the full analysis of this relationship, you may be able to proceed, but you also need to consider the interpretation of any results. Hope this helps, Roger I would like to learn which model was better fit.
However, the measures based on log likelihood and AIC imply different contradictions. 1. log likehood SAR is better than CAR (4919,629 > 3694.246) 2. AIC CAR is better than SAR (-7370.5 > -9821.3) Please kindly instruct which criterion I should follow, and advice on any other measure will be highly appreciated. Elaine ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SAR Lambda: 0.999 LR test value: 13618 p-value: < 2.22e-16 Log likelihood: 4919.629 ML residual variance (sigma squared): 0.0075669, (sigma: 0.086988) Number of observations: 4873 Number of parameters estimated: 9 AIC: -9821.3 Nagelkerke pseudo-R-squared: 1 CAR Lambda: 0.999 LR test value: 11167 p-value: < 2.22e-16 Log likelihood: 3694.246 ML residual variance (sigma squared): 0.012682, (sigma: 0.11261) Number of observations: 4873 Number of parameters estimated: 9 AIC: -7370.5 Nagelkerke pseudo-R-squared: 1
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no