Probit regression with limited parameter space

[cc'ing back to r-help again -- I do this so the answers can be
archived and viewed by others]
Prof. Bolker,

Thanks for your quick reply and detailed explanation.

I also ran the unrestricted model using glmfit <-
glm(y~x1+x2+x3+x4+x5+x6+x7+x8, family=binomial(link="probit"),data=d).
However, the results I got from glm and mle2 (both for the unrestricted
model) are not very similar (please see below).  In your earlier example,
both glm and mle2 produce almost the same estimation results.  I just hope
to figure out what might cause the discrepancy in the estimation results
I've got.

coef(summary(*glmfit*))
                Estimate   Std. Error    z value     Pr(>|z|)
(Intercept) -0.853900059 0.2464179864 -3.4652505 5.297377e-04
x1             1.627125691 0.3076174699  5.2894450 1.226881e-07
x2            -0.092716326 0.5229866504 -0.1772824 8.592866e-01
x3            -3.301509522 0.9169991843 -3.6003407 3.178004e-04
x4             7.187483436 2.2135961171  3.2469715 1.166401e-03
x5            -0.002544181 0.0112740324 -0.2256673 8.214602e-01
x6             6.978374268 2.2347939216  3.1226030 1.792594e-03
x7            -0.009832379 0.0113807583 -0.8639476 3.876167e-01
x8            -0.001252075 0.0002304789 -5.4324941 5.557178e-08

coef(summary(*mle2fit*))
                        Estimate  Std. Error     z value        Pr(z)
pprobit.(Intercept) -0.603492668 0.230117071  -2.6225463 8.727541e-03
pprobit.x1             1.645984346 0.288479906   5.7057158 1.158552e-08
pprobit.x2            -0.157361533 0.523048376  -0.3008546 7.635253e-01
pprobit.x3            -3.935203692 0.932692587  -4.2191862 2.451857e-05
pprobit.x4           7.512701611 0.062911076 119.4177885 0.000000e+00
pprobit.x5            -0.001475556 0.011525137  -0.1280293 8.981258e-01
pprobit.x6           7.399355063 0.018372749 402.7353318 0.000000e+00
pprobit.x7            -0.010113008 0.011647725  -0.8682388 3.852636e-01
pprobit.x8            -0.001650021 0.000244997  -6.7348622 1.640854e-11
My best guess is that you are running into optimization problems.

The big advantage of glm() is that it uses a special-purpose
optimization method (iteratively reweighted least squares) that is
generally much more robust/reliable than general-purpose nonlinear
optimizers such as nlminb.  If there is indeed a GLM fitting routine
coded in R, somewhere, that someone has adapted to work with box
constraints, it will probably perform better than mle2.

Some general suggestions for troubleshooting this:

 * check the log-likelihoods returned by the two methods.  If they are
very close (say within 0.01 likelihood units), then the issue is that
you just have a very flat goodness-of-fit surface, and the two sets of
coefficients are in practice very similar to each other.

 * if possible, try starting each approach (glm(), mle2()) from the
solution found by the other (it's a little bit of a pain to get the
syntax right here) and see if they get stuck right where they are or
whether they find that one answer or the other is right.

 * if you were using one of the optimizing methods from optim() (rather
than nlminb), e.g. L-BFGS-B, I would suggest you try using parscale to
rescale the parameters to have approximately equal magnitudes near the
solution.  This apparently isn't possible with nlminb, but you could try
optimizer="optim" (the default), method="L-BFGS-B" and see how you do
(although L-BFGS-B is often a bit finicky).  Alternatively, you can try
optimizer="optimx", in which case you have a larger variety of
unconstrained optimizers to choose from (you have to install the optimx
package and take a look at its documentation).  Alternatively, you can
scale your input variables (e.g. use scale() on your input matrix to get
zero-centered, sd 1 variables), although you would then have to adjust
your lower and upper bounds accordingly.

 * it's a bit more work, but you may be able to unpack this a bit and
provide analytical derivatives.  That would help a lot.

  In short: you are entering the quagmire of numerical optimization methods.

   I have learned most of this stuff by trial and error -- can anyone on
the list suggest a good/friendly introduction?  (Press et al Numerical
Recipes; Givens and Hoeting's Computational Statistics book looks good,
although I haven't read it ...)

  Ben Bolker

On Thu, Feb 2, 2012 at 1:12 PM, Ben Bolker <bbolker at gmail.com> wrote:

 [cc'ing back to r-help]

On 12-02-02 01:56 PM, Sally Luo wrote:

I tried to adapt your code to my model and got the results as below.  I
don't know how to fix the warning messages. It says "rearrange the lower
(or upper) bounds to match 'start'".
 The warning is overly conservative in this case.  I should work on
engineering the package so that it handles this better. You can
disregard them.

 In answer to your previous questions:

 * "size" refers to the number of trials per observation (1, if you have
binary data)
 * you've got the form of the lower and upper bounds right.
 * you've got the formula in 'parameters' right -- this builds a linear
model (using R's model.matrix) on the probit scale based on the 8
parameters
And two of the estimates for my restricted parameters are on the
boundary.
The warning message says the variance-covariance calculations may be
unreliable.  Those parameters are the ones of interest to my study.  Can
I
still make inferences using the p-values reported by mle2 in this case?
 That's quite tricky unfortunately, and it isn't a problem that's
specific to the mle2 package.  The basic issue is that the whole
derivation of the multivariate normal sampling distribution of the
maximum likelihood estimator depends on the maximum likelihood being an
interior local maximum (and hence having a negative-definite hessian, or
a positive-definite information matrix), which is untrue on the boundary
-- the Wikipedia article on maximum likelihood mentions this issue, for
example http://en.wikipedia.org/wiki/Maximum_likelihood

 Perhaps someone here can suggest an approach (although it gets outside
the scope of "R help", or you can ask on http://stats.stackexchange.com...

Thanks for your help.  ---- Sally

mle2fit <- mle2(y~dbinom(pnorm(pprobit),size=1),
+             parameters=list(pprobit~x1+x2+x3+x4+x5+x6+x7+x8),
+             start=list(pprobit=0),
+             optimizer="nlminb",
+             lower=c(-Inf,-1,-1,-1,-Inf,-Inf,-Inf,-Inf,-Inf),
+             upper=c(Inf,1,1,1,Inf,Inf,Inf,Inf,Inf),
+             data=d)

Warning messages:
1: In fix_order(call$lower, "lower bounds", -Inf) :
  lower bounds not named: rearranging to match 'start'
2: In fix_order(call$upper, "upper bounds", Inf) :
  upper bounds not named: rearranging to match 'start'
3: In mle2(y ~ dbinom(pnorm(pprobit), size = 1), parameters =
list(pprobit
~  :
  some parameters are on the boundary: variance-covariance calculations
based on Hessian may be unreliable

On Wed, Feb 1, 2012 at 11:16 PM, Sally Luo <shali623 at gmail.com> wrote:

Prof. Bolker,

Thanks a lot for your reply.

In my model, I have 9 explanatory variables and I need to restrict the
range of parameters 2-4 to (-1,1).  I tried to modify the univariate
probit
example you gave in your reply, however, I could not get through.

Specificially, I am not sure what 'pprobit' represents in your code? How
should I code this part if I have more than one variable?

Also does "size" refer to the number of parameters?

Since only 3 parameters need to be restricted in my model, should I
write
lower=c(-Inf, -1,-1,-1, -Inf, -Inf, -Inf, -Inf, -Inf) and upper=c(Inf,
1,1,1, Inf, Inf, Inf, Inf, Inf)?

Thanks again for your kind help.

Best,

Sally

On Wed, Feb 1, 2012 at 7:19 AM, Ben Bolker <bbolker at gmail.com> wrote:

 Sally Luo <shali623 <at> gmail.com> writes:

Dear R helpers,

I need to estimate a probit model with box constraints placed on
several of
the model parameters.  I have the following two questions:

1) How are the standard errors calclulated in glm
(family=binomial(link="probit")?  I ran a typical probit model using
the
glm probit link and the nlminb function with my own coding of the
loglikehood, separately. As nlminb does not produce the hessian
matrix,
I
used hessian (numDeriv) to calculate it.  However, the standard errors
calculated using hessian function are quite different from the ones
generated by the glm function, although the parameter estimates are
very
close.  I was wondering what makes this difference in the estmation of
standard errors and how this computation is carried out in glm.

2) Does any one know how to estimate a constrained probit model in R
(to be
specific, I need to restrain the range of three parameters to [-1,1])?
Among the optimation functions, so far nlminb and spg work for my
problem,
but neither produces a hessian matrix.  As I mentioned above, if I use
hessian funciton and calculate standard errors manually, the standard
errors seem not right.

  I'm a little biased, but I think the bbmle package is the
easiest way to get this done -- it provides convenient wrappers
for a range of optimizers including nlminb.
  I would warn however that you should be very careful interpreting
the meaning of the Hessian matrix if some of your parameters lie
on the boundary of the feasible space ...

set.seed(101)
x <- runif(100)
p <- pnorm(1+3*x)
y <- rbinom(100,p,size=1)
d <- data.frame(x,y)
glmfit <- glm(y~x,family=binomial(link="probit"),data=d)
coef(summary(glmfit))

library(bbmle)
mle2fit <- mle2(y~dbinom(pnorm(pprobit),size=1),
            parameters=list(pprobit~x),
            start=list(pprobit=0),
            data=d)
coef(summary(mle2fit))

## now add constraints
mle2fit_constr <- mle2(y~dbinom(pnorm(pprobit),size=1),
            parameters=list(pprobit~x),
            start=list(pprobit=0),
            optimizer="nlminb",
            lower=c(2,0.15),
            data=d)

coef(summary(mle2fit_constr))

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
<http://www.r-project.org/posting-guide.html>
 >>> and provide commented, minimal, self-contained, reproducible code.