ME function for negative binomial

Sorry for not providing all the information appropriately and thank you so
much for your comments.
I would like to share my dataset, but, because of the confidentiality issue,
I could not share it.
So then you will have to provide code demonstrating the problem with a 
data set that is available, say one shipped with one of the packages, 
perhaps in ade4 or similar. Have you also looked at advice given in:

http://www.bio.umontreal.ca/numecolR/

especially chapter 7?

Roger
All my explanatory variables are continuous variables and my dependent
variable almost shows exactly the same histogram distribution as this link:
(http://www.ats.ucla.edu/stat/sas/dae/nbreg.htm)
When I run the base negative binomial model without using "ME function," as
you see, I have to expand the # of convergence (maxit=100 option in glm.nb)
to be converged.
Without this, I could not converge the model appropriately.
Is it possible that, if the base model requires more # of convergence to
find the initial theta, this feature is also reflected to the ME function?

Regarding your comment, I have a couple of clarifying questions.
"The tests" in the below comment refer both "moran.test" and "lm.morantest"
(not ME function)?
So, do you mean, although ME function can be used for normal, binomial,
poisson and negative binomial, the above two tests may not be appropriate
for binomial, poisson and negative binomial distribution?

In addition, if I received these "not converged" warnings, could you teach
me some strategies that I may use to make the model be converged?
Again, thank you so much for your answer.
HJ

-----Original Message-----
From: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Sent: Monday, February 13, 2012 2:31 AM
To: HJ Kil
Cc: r-sig-geo at r-project.org
Subject: Re: [R-sig-Geo] ME function for negative binomial

On Sun, 12 Feb 2012, HJ Kil wrote:

Dear all,

First of all, thank you so much for reading this long mail (Please
read this if you are familiar with the "ME" function).

Please rework this question with data that everyone can see. There is no
chance at all of seeing what is wrong without your casting this in terms of
a standard data set, or making your own data available. The tests you are
using have not been established for non-normal variables, so apparent
results on GLM residuals may not be correctly sized.

Warnings and lack of convergence are always potentially serious and must
always be checked out (to see what the cause is).

Roger

I need your help and comments to conduct an analysis with "ME"
function of the package spdep.

I am trying to run a regression (a count dependent variable and 7-8
explanatory variables) with about 2000 observations (adjacent census
tracts in an urban area).

Basically, I am trying to look at how these areas' demographics
aspects (7-8 explanatory variables) are related to a certain social
aspect (my dependent count variable).

To do this, I used the below script (Italic: parts of results):

### Load all libraries: I skipped all the libraries that I loaded

### Read shape data file made/joined by ArcGis and a queen matrix
(gal) made by Geoda

myshp <- readShapeSpatial("a.shp", ID="ID")

mygal <- read.gal("a.GAL")

#Test symmetry of the matrix: Result was "TRUE"

print(is.symmetric.nb(mygal))

[1] TRUE

#Change nb to listw

mylistw <- nb2listw(mygal)

#Run a poisson model as a base: including offset since I would like to
know "density"

base.poisson <- glm(dep ~ a + b + c + d + e + f + g,
offset(log(TOT_POP)), data=myshp, family="poisson")

#Run dispersion test to check whether a negative binomial model is better:
RESULT showed over-dispersed.

dispersiontest(base.poisson)

       Overdispersion test

data:  base.poisson

z = 8.711, p-value < 2.2e-16

alternative hypothesis: true dispersion is greater than 1

sample estimates:

dispersion

 3.123331

#Run a negative binomial regression as a base by using "glm.nb"

base.nb <- glm.nb(dep ~ a + b + c + d + e + f + g +
offset(log(TOTAL_POP)), data=myshp, maxit=100)

summary(base.nb)

#Based on the result of the above, I found that "initial theta" was
"1.174928547"

#I used the other way of running a negative binomial with the above value:

base.nb2 <- glm(dep ~  a + b + c + d + e + f + g +
offset(log(TOTAL_POP)), data=myshp,
family=negative.binomial(1.174928547), maxit=100)

summary(base.nb)

# I found that both results are VERY SIMILAR (decimal differences)

# Conducting "moran.test" and "lm.morantest" to supplementary check
the existence of the spatial autocorrelation in the residuals before
moving to the eigenfiltering

moran.test(residuals.glm(base.nb), mylistw)

       Moran's I test under randomisation

Moran I statistic standard deviate = 10.835, p-value < 2.2e-16

alternative hypothesis: greater

sample estimates:

Moran I statistic       Expectation          Variance

    0.1372647104     -0.0004995005      0.0001616635

moran.test(residuals.glm(base.nb2), mylistw)

       Moran's I test under randomisation

Moran I statistic standard deviate = 10.837, p-value < 2.2e-16

alternative hypothesis: greater

sample estimates:

Moran I statistic       Expectation          Variance

    0.1372900049     -0.0004995005      0.0001616636

lm.morantest(base.nb, mylistw)

  Global Moran's I for regression residuals

Moran I statistic standard deviate = 16.465, p-value < 2.2e-16

alternative hypothesis: greater

sample estimates:

Observed Moran's I        Expectation           Variance

     0.2056132825      -0.0028246948       0.0001602612

lm.morantest(base.nb2, mylistw)

    Global Moran's I for regression residuals

Moran I statistic standard deviate = 16.4668, p-value < 2.2e-16

alternative hypothesis: greater

sample estimates:

Observed Moran's I        Expectation           Variance

     0.2056358254      -0.0028246666       0.0001602612

## I do not know whether both tests are OK for a negative binomial
model, it seems that they all showed that there existed somewhat
significant spatial autocorrelations on the residuals.

#Run "ME" to identify whether the model needs eigenvectors and, if
needed, how many eigenvectors are needed.

### FIRST with base.nb2 (glm above)

myme.nb2 <- ME(dep ~ a + b + c + d + e + f + g, data=myshp, family =
negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
alpha=0.05, verbose=TRUE, nsim=99)

Error in ME(a ~ b + c + d + e +  :

  base correlation larger than alpha

In addition: Warning message:

glm.fit: algorithm did not converge

## SECOND with base.nb1 (glm.nb)

## To do this, I copied and pasted the new "ME.nb" command made by
Pablo (Feb. 03):
http://www.mail-archive.com/r-sig-geo at r-project.org/msg03967.html

myme.nb <- ME.nb (dep ~ a + b + c + d + e + f + g +
offset(log(TOTAL_POP), data=myshp, listw=mylistw, alpha=0.05, nsim=99,
verbose =TRUE)

Error in ME.nb(base.nb2, data = myshp, listw = mylistw, alpha = 0.05,  :

 base correlation larger than alpha

In addition: Warning messages:

1: glm.fit: algorithm did not converge

2: glm.fit: algorithm did not converge

3: In glm.nb(formula, data = data) : alternation limit reached

### I TRIED AFTER CHANGING "stdev" OPTIONS BOTH, BUT THE RESULTS WERE
THE SAME.

### WHEN I INCREASED THE ALPHA LEVEL TO 0.4 IN BOTH, THE RESULTS WERE
THE SAME.

### HOWEVER, WHEN I INCREASED THE ALPHA LEVEL TO 4.7 IN BOTH, I
FINALLY GOT THE BELOW RESULT.

myme.nb2 <- ME (dep~ a + b + c + d + e + f + g, data=myshp, family =
negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
alpha=0.48, stdev=TRUE, verbose=TRUE, nsim=99)

eV[,135], I: 3.805725e-09 ZI: -0.1491594, pr(ZI): 0.4407139

eV[,18], I: 1.253316e-10 ZI: -0.02069758, pr(ZI): 0.4917434

There were 50 or more warnings (use warnings() to see the first 50)

(all warnings: glm.fit: algorithm did not converge)

myme.nb1 <- ME.nb (REV_NUM ~ RACE_H_6_B + AGE_H_8_B + EDU_H_7_TO +
INC_H_GINI + POV_PE + AVE_HOU_IN + GOV_REV + offset(log(TOTAL_POP)),
data=myshp, listw=mylistw, alpha=0.48, nsim=999, verbose =TRUE)

eV[,15], I: 2.775260e-08 ZI:

NA, pr(ZI): 0.451

eV[,17], I: 1.302856e-08 ZI: NA, pr(ZI): 0.486

There were 50 or more warnings (use warnings() to see the first 50)

(warnings:  glm.fit: algorithm did not converge

In glm.nb(formula, data = data) : alternation limit reached

In glm.nb(Y ~ iX) : alternation limit reached)

### Based on the result, if the warnings (glm.fit: algorithm did not
converge) are not serious (particularly for the lower alpha level at
which "base correlation larger than alphs showed"), I decided to stick
to a negative binomial (non-spatial) model for my statistics.

Here are some questions to the groups related to this work:

First, is there any way that I can make the algorithm be converged?
(like
maxit=100 in glm.nb).

Particularly, if Pablo can read this, It would appreciate if you can
give me some suggestion for the modification of your program.

Second, could you let me know if you find any mistakes on my script?
In addition, does my conclusion (using a non-spatial negative
binomial) seem to be OK?

Or, if you think the warning is very serious, what alternative could
you suggest? Do you think a spatial poisson may be an alternative?
(considering spatial autocorrelation is more important than over
dispersion.)

(When I conduct the same procedure (the ME function) with the poisson
model above, I did not get any warnings or errors with "ME" function.)

Third, regarding the differences of Moran's I among "moran.test,"
"lm.morantest," and "ME,"

If the Moran's I from the "ME" function is correct for glm, I think
that both "moran.test" and "lm.morantest" are somewhat risky for both
poisson or negative binomial, at least a model with "offset" (like mine).

(Actually, I also tried to use the ME function with the poisson base
model above (though my data seems overdispersed). Then, the result
requires me to add about 20 eigenvectors into my model at alpha 0.2,
so that I "fitted" the eigenvectors from the "ME" results to my base
poisson model. Then, I double-checked the spatial poisson regression
with both "moran.test" and "lm.morantest," and they still showed
somewhat significant spatial autocorrelations in the model.)

Could anyone explain the reason of these differences?

Fourth, this is somewhat different topic, but I heard that Bayesian is
another option for these non-normal distributed model. However, I also
read a couple of articles (e.g., Gelman(2008)), that it may be
dangerous to use it since it requires "prior distribution or belief"
incorporated in the model. Is it also applied to spatial statistics?
For me, I have never use the Bayesian yet and it seems to me that I
can use it for now with my dataset.

I would be very appreciated for any comments.

Thanks,

HJ

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics, Helleveien 30,
N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

ME function for negative binomial

Thread (4 messages)