contradicting measures based on Log likelihood and AIC in spatial models

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20101219/547bf34e/attachment.pl>

Dear List,

I am running generalized linear models considering spatial autocorrelation.

(Moran? s I = 0.52)
Was this using lm.morantest() - it should have been?
(sample size 4873, explanatory variable number: 6)

After trying SAR and CAR in package spdep, the results are as followed.

Neither of the fits are credible, as the line search has terminated in 
both cases at its upper limit. These is something seriously wrong with 
your analysis. Which method= argument were you using, presumably in 
spautolm? You did not quote the exact way in which you called the 
functions used - this will be where your problems lie.

Are you using a weights matrix with very small values (and very small row 
sums), and a sparse matrix method? Do you get different outcomes if you 
set the search interval yourself - or accept the default and 
row-standardise the weights (style="W" in nb2listw)? Look at 
summary(sapply(listw$weights, sum)) where listw is your listw object. I'll 
add a warning to spautolm() to test whether the result is equal to a line 
search bound.

In reply to your question, had your analysis not been flawed, you could 
not judge between CAR and SAR in the way you suggest as the models are not 
strictly nested. Please see Ripley (1981) Spatial Statistics, Wiley, p. 
90, for a discussion of the relationship between SAR and CAR 
representations. Briefly, if we term the CAR weights matrix C and the SAR 
weights matrix S, then C = S + S' - S'S, and if LL' is the Cholesky 
decomposition of (I - C), S = I - L'. If you do the full analysis of this 
relationship, you may be able to proceed, but you also need to consider 
the interpretation of any results.

Hope this helps,

Roger
I would like to learn which model was better fit.

However, the measures based on log likelihood and AIC imply different
contradictions.

1. log likehood

 SAR is better than CAR (4919,629 > 3694.246)

2. AIC

 CAR is better than SAR (-7370.5 > -9821.3)

Please kindly instruct which criterion I should follow, and advice on any
other measure will be highly appreciated.

Elaine

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SAR
Lambda: 0.999 LR test value: 13618 p-value: < 2.22e-16
Log likelihood: 4919.629
ML residual variance (sigma squared): 0.0075669, (sigma: 0.086988)

Number of observations: 4873

Number of parameters estimated: 9

AIC: -9821.3

Nagelkerke pseudo-R-squared: 1

CAR

Lambda: 0.999 LR test value: 11167 p-value: < 2.22e-16

Log likelihood: 3694.246

ML residual variance (sigma squared): 0.012682, (sigma: 0.11261)

Number of observations: 4873

Number of parameters estimated: 9

AIC: -7370.5

Nagelkerke pseudo-R-squared: 1

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20101220/b910815b/attachment.pl>

Dear Dr. Bivand,

Thank you for the clear explanation.
Your guess is right, and the missing code is pasted following each question.
I will check the book you recommended and please advise for improvement of
the code below.
Elaine,

There are a number of issues with your code; you have been ignoring a 
warning after running the CAR model too - the function should have 
reported:

Non-symmetric spatial weights in CAR model

which are not permitted, but may be valid if case weights are used too. 
You are also using na.omit, which suggests that your data is not complete 
- this may introduce additional problems.

I'm concerned that your distance based weights may also be incorrectly 
specified, as you are using distances on the plane, but I suspect that you 
should be using Great Circle distances.

Since this is a use case in which the spatial coefficients are hitting 
their upper bounds, I would be interested in seeing your data and script, 
if you can make then available, just in case there is something that ought 
to be fixed in the function. Could you please either attach 
Mig_ratio_20101214.csv to an offlist email reply, or if it is large, put 
it on a website and send the URL?

Best wishes,

Roger
Thanks

Elaine

(Moran? s I = 0.52)

Was this using lm.morantest() - it should have been?
  =>  Yes
  # code
     mig.moran<-lm.morantest(mig.std,nb8.w)
      print(mig.moran)

(sample size 4873, explanatory variable number: 6)

After trying SAR and CAR in package spdep, the results are as followed.

Neither of the fits are credible, as the line search has terminated in both
cases at its upper limit. These is something seriously wrong with your
analysis. Which method= argument were you using, presumably in spautolm? You
did not quote the exact way in which you called the functions used - this
will be where your problems lie.

  => Yes, spautolm was used.

   summary(sapply(listw$weights, sum)
  => It shows
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
     1       1       1       1       1       1

# code

   datam <-read.csv("c:/migration/Mig_ratio_20101214.csv",header=T,
row.names=1)
   dim(datam)

    library(ncf)
library(spdep)

# get the upper bound
 up <- knearneigh(cbind(datam$lon,datam$lat))
 upknn <- knn2nb(up)
 updist1 <- nbdists(upknn,cbind(datam$lon,datam$lat))
 updist1
 updistvec <- unlist(updist1)
 updistvec
 upmaxd <- max(updistvec)
 upmaxd  #the upper bound  (8.1124)

#******************************************************************************************
# spatial listw object

   # Define coordinates, neighbours, and spatial weights
     coords<-cbind(datam$lon,datam$lat)
     coords<-as.matrix(coords)

   # Define neighbourhood (here distance 8)
     nb8<-dnearneigh(coords,0,8.12)
     summary(nb8)

     #length(nb8)
     #sum(card(nb8))

   # Spatial weights, illustrated with coding style "W" (row standardized)
     nb8.w<-nb2listw(nb8, glist=NULL, style="W")

   # regression

   # std model
      datam.sd<-scale(datam)
      datam.std<-as.data.frame(datam.sd)
      summary (datam.std)
      mean(datam.std)

   # obtain standard deviation
     sd(datam.std)

   # model
   mig.std <-lm(SMB~t_r+e_r+p_e+a_r+
   coast+Iso_1214,data=datam.std)

   summary(mig.std)

   # SAR of spautolm
    mignb8.sar <- spautolm(SMB~t_r+e_r+p_e+a_r+
   coast+Iso_1214,data=datam.std,
    listw=nb8.w, na.action=na.omit, zero.policy=TRUE,
    family="SAR", method="Matrix", verbose=TRUE)
    summary(mignb8.sar,Nagelkerke = TRUE)

  # CAR of spautolm
    mignb8.car <- spautolm(SMB~t_r+e_r+p_e+a_r+
   coast+Iso_1214,data=datam.std,
    listw=nb8.w, na.action=na.omit, , zero.policy=TRUE,
    family="CAR", method="Matrix", verbose=TRUE)
    summary(mignb8.car,Nagelkerke = TRUE)

Are you using a weights matrix with very small values (and very small row
sums), and a sparse matrix method? Do you get different outcomes if you set
the search interval yourself - or accept the default and row-standardise the
weights (style="W" in nb2listw)? Look at summary(sapply(listw$weights, sum))
where listw is your listw object. I'll add a warning to spautolm() to test
whether the result is equal to a line search bound.

In reply to your question, had your analysis not been flawed, you could not
judge between CAR and SAR in the way you suggest as the models are not
strictly nested. Please see Ripley (1981) Spatial Statistics, Wiley, p. 90,
for a discussion of the relationship between SAR and CAR representations.
Briefly, if we term the CAR weights matrix C and the SAR weights matrix S,
then C = S + S' - S'S, and if LL' is the Cholesky decomposition of (I - C),
S = I - L'. If you do the full analysis of this relationship, you may be
able to proceed, but you also need to consider the interpretation of any
results.

Hope this helps,

Roger

 I would like to learn which model was better fit.
However, the measures based on log likelihood and AIC imply different
contradictions.

1. log likehood

 SAR is better than CAR (4919,629 > 3694.246)

2. AIC

 CAR is better than SAR (-7370.5 > -9821.3)

Please kindly instruct which criterion I should follow, and advice on any
other measure will be highly appreciated.

Elaine

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SAR
Lambda: 0.999 LR test value: 13618 p-value: < 2.22e-16
Log likelihood: 4919.629
ML residual variance (sigma squared): 0.0075669, (sigma: 0.086988)

Number of observations: 4873

Number of parameters estimated: 9

AIC: -9821.3

Nagelkerke pseudo-R-squared: 1

CAR

Lambda: 0.999 LR test value: 11167 p-value: < 2.22e-16

Log likelihood: 3694.246

ML residual variance (sigma squared): 0.012682, (sigma: 0.11261)

Number of observations: 4873

Number of parameters estimated: 9

AIC: -7370.5

Nagelkerke pseudo-R-squared: 1

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no