ME function for negative binomial
On Tue, 14 Feb 2012, HJ Kil wrote:
Sorry for not providing all the information appropriately and thank you so much for your comments. I would like to share my dataset, but, because of the confidentiality issue, I could not share it.
So then you will have to provide code demonstrating the problem with a data set that is available, say one shipped with one of the packages, perhaps in ade4 or similar. Have you also looked at advice given in: http://www.bio.umontreal.ca/numecolR/ especially chapter 7? Roger
All my explanatory variables are continuous variables and my dependent variable almost shows exactly the same histogram distribution as this link: (http://www.ats.ucla.edu/stat/sas/dae/nbreg.htm) When I run the base negative binomial model without using "ME function," as you see, I have to expand the # of convergence (maxit=100 option in glm.nb) to be converged. Without this, I could not converge the model appropriately. Is it possible that, if the base model requires more # of convergence to find the initial theta, this feature is also reflected to the ME function? Regarding your comment, I have a couple of clarifying questions. "The tests" in the below comment refer both "moran.test" and "lm.morantest" (not ME function)? So, do you mean, although ME function can be used for normal, binomial, poisson and negative binomial, the above two tests may not be appropriate for binomial, poisson and negative binomial distribution? In addition, if I received these "not converged" warnings, could you teach me some strategies that I may use to make the model be converged? Again, thank you so much for your answer. HJ -----Original Message----- From: Roger Bivand [mailto:Roger.Bivand at nhh.no] Sent: Monday, February 13, 2012 2:31 AM To: HJ Kil Cc: r-sig-geo at r-project.org Subject: Re: [R-sig-Geo] ME function for negative binomial On Sun, 12 Feb 2012, HJ Kil wrote:
Dear all, First of all, thank you so much for reading this long mail (Please read this if you are familiar with the "ME" function).
Please rework this question with data that everyone can see. There is no chance at all of seeing what is wrong without your casting this in terms of a standard data set, or making your own data available. The tests you are using have not been established for non-normal variables, so apparent results on GLM residuals may not be correctly sized. Warnings and lack of convergence are always potentially serious and must always be checked out (to see what the cause is). Roger
I need your help and comments to conduct an analysis with "ME"
function of the package spdep.
I am trying to run a regression (a count dependent variable and 7-8
explanatory variables) with about 2000 observations (adjacent census
tracts in an urban area).
Basically, I am trying to look at how these areas' demographics
aspects (7-8 explanatory variables) are related to a certain social
aspect (my dependent count variable).
To do this, I used the below script (Italic: parts of results):
### Load all libraries: I skipped all the libraries that I loaded
### Read shape data file made/joined by ArcGis and a queen matrix
(gal) made by Geoda
myshp <- readShapeSpatial("a.shp", ID="ID")
mygal <- read.gal("a.GAL")
#Test symmetry of the matrix: Result was "TRUE"
print(is.symmetric.nb(mygal))
[1] TRUE
#Change nb to listw
mylistw <- nb2listw(mygal)
#Run a poisson model as a base: including offset since I would like to
know "density"
base.poisson <- glm(dep ~ a + b + c + d + e + f + g,
offset(log(TOT_POP)), data=myshp, family="poisson")
#Run dispersion test to check whether a negative binomial model is better:
RESULT showed over-dispersed.
dispersiontest(base.poisson)
Overdispersion test
data: base.poisson
z = 8.711, p-value < 2.2e-16
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
3.123331
#Run a negative binomial regression as a base by using "glm.nb"
base.nb <- glm.nb(dep ~ a + b + c + d + e + f + g +
offset(log(TOTAL_POP)), data=myshp, maxit=100)
summary(base.nb)
#Based on the result of the above, I found that "initial theta" was
"1.174928547"
#I used the other way of running a negative binomial with the above value:
base.nb2 <- glm(dep ~ a + b + c + d + e + f + g +
offset(log(TOTAL_POP)), data=myshp,
family=negative.binomial(1.174928547), maxit=100)
summary(base.nb)
# I found that both results are VERY SIMILAR (decimal differences)
# Conducting "moran.test" and "lm.morantest" to supplementary check
the existence of the spatial autocorrelation in the residuals before
moving to the eigenfiltering
moran.test(residuals.glm(base.nb), mylistw)
Moran's I test under randomisation
Moran I statistic standard deviate = 10.835, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.1372647104 -0.0004995005 0.0001616635
moran.test(residuals.glm(base.nb2), mylistw)
Moran's I test under randomisation
Moran I statistic standard deviate = 10.837, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.1372900049 -0.0004995005 0.0001616636
lm.morantest(base.nb, mylistw)
Global Moran's I for regression residuals
Moran I statistic standard deviate = 16.465, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran's I Expectation Variance
0.2056132825 -0.0028246948 0.0001602612
lm.morantest(base.nb2, mylistw)
Global Moran's I for regression residuals
Moran I statistic standard deviate = 16.4668, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran's I Expectation Variance
0.2056358254 -0.0028246666 0.0001602612
## I do not know whether both tests are OK for a negative binomial
model, it seems that they all showed that there existed somewhat
significant spatial autocorrelations on the residuals.
#Run "ME" to identify whether the model needs eigenvectors and, if
needed, how many eigenvectors are needed.
### FIRST with base.nb2 (glm above)
myme.nb2 <- ME(dep ~ a + b + c + d + e + f + g, data=myshp, family =
negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
alpha=0.05, verbose=TRUE, nsim=99)
Error in ME(a ~ b + c + d + e + :
base correlation larger than alpha
In addition: Warning message:
glm.fit: algorithm did not converge
## SECOND with base.nb1 (glm.nb)
## To do this, I copied and pasted the new "ME.nb" command made by
Pablo (Feb. 03):
http://www.mail-archive.com/r-sig-geo at r-project.org/msg03967.html
myme.nb <- ME.nb (dep ~ a + b + c + d + e + f + g +
offset(log(TOTAL_POP), data=myshp, listw=mylistw, alpha=0.05, nsim=99,
verbose =TRUE)
Error in ME.nb(base.nb2, data = myshp, listw = mylistw, alpha = 0.05, :
base correlation larger than alpha
In addition: Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: algorithm did not converge
3: In glm.nb(formula, data = data) : alternation limit reached
### I TRIED AFTER CHANGING "stdev" OPTIONS BOTH, BUT THE RESULTS WERE
THE SAME.
### WHEN I INCREASED THE ALPHA LEVEL TO 0.4 IN BOTH, THE RESULTS WERE
THE SAME.
### HOWEVER, WHEN I INCREASED THE ALPHA LEVEL TO 4.7 IN BOTH, I
FINALLY GOT THE BELOW RESULT.
myme.nb2 <- ME (dep~ a + b + c + d + e + f + g, data=myshp, family =
negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
alpha=0.48, stdev=TRUE, verbose=TRUE, nsim=99)
eV[,135], I: 3.805725e-09 ZI: -0.1491594, pr(ZI): 0.4407139
eV[,18], I: 1.253316e-10 ZI: -0.02069758, pr(ZI): 0.4917434
There were 50 or more warnings (use warnings() to see the first 50)
(all warnings: glm.fit: algorithm did not converge)
myme.nb1 <- ME.nb (REV_NUM ~ RACE_H_6_B + AGE_H_8_B + EDU_H_7_TO +
INC_H_GINI + POV_PE + AVE_HOU_IN + GOV_REV + offset(log(TOTAL_POP)),
data=myshp, listw=mylistw, alpha=0.48, nsim=999, verbose =TRUE)
eV[,15], I: 2.775260e-08 ZI:
NA, pr(ZI): 0.451
eV[,17], I: 1.302856e-08 ZI: NA, pr(ZI): 0.486
There were 50 or more warnings (use warnings() to see the first 50)
(warnings: glm.fit: algorithm did not converge
In glm.nb(formula, data = data) : alternation limit reached
In glm.nb(Y ~ iX) : alternation limit reached)
### Based on the result, if the warnings (glm.fit: algorithm did not
converge) are not serious (particularly for the lower alpha level at
which "base correlation larger than alphs showed"), I decided to stick
to a negative binomial (non-spatial) model for my statistics.
Here are some questions to the groups related to this work:
First, is there any way that I can make the algorithm be converged?
(like
maxit=100 in glm.nb).
Particularly, if Pablo can read this, It would appreciate if you can
give me some suggestion for the modification of your program.
Second, could you let me know if you find any mistakes on my script?
In addition, does my conclusion (using a non-spatial negative
binomial) seem to be OK?
Or, if you think the warning is very serious, what alternative could
you suggest? Do you think a spatial poisson may be an alternative?
(considering spatial autocorrelation is more important than over
dispersion.)
(When I conduct the same procedure (the ME function) with the poisson
model above, I did not get any warnings or errors with "ME" function.)
Third, regarding the differences of Moran's I among "moran.test,"
"lm.morantest," and "ME,"
If the Moran's I from the "ME" function is correct for glm, I think
that both "moran.test" and "lm.morantest" are somewhat risky for both
poisson or negative binomial, at least a model with "offset" (like mine).
(Actually, I also tried to use the ME function with the poisson base
model above (though my data seems overdispersed). Then, the result
requires me to add about 20 eigenvectors into my model at alpha 0.2,
so that I "fitted" the eigenvectors from the "ME" results to my base
poisson model. Then, I double-checked the spatial poisson regression
with both "moran.test" and "lm.morantest," and they still showed
somewhat significant spatial autocorrelations in the model.)
Could anyone explain the reason of these differences?
Fourth, this is somewhat different topic, but I heard that Bayesian is
another option for these non-normal distributed model. However, I also
read a couple of articles (e.g., Gelman(2008)), that it may be
dangerous to use it since it requires "prior distribution or belief"
incorporated in the model. Is it also applied to spatial statistics?
For me, I have never use the Bayesian yet and it seems to me that I
can use it for now with my dataset.
I would be very appreciated for any comments.
Thanks,
HJ
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no