Skip to content

Help: Maximum likelihood estimation

6 messages · roach, Ravi Varadhan, dave fournier

#
I was trying to reproduce a result in a published journal, and I have come
across some difficulties.
I have the following equation, which is two equations combined together.
http://r.789695.n4.nabble.com/file/n3006584/Screenshot.png 
where
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-1.png 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-2.png 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-3.png 
I[t] is unknown, but have the following distribution
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-4.png 
hence, the probability density function is 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-5.png 
and the likelihood function is 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-6.png 
It used Kiefer's E-M algorithm to estimate the problem. To simplify, first
assume lamda is known. I multiply the matrix in the probability density
function, and write it in a non-matrix form, and use the function optim() to
estimate the maximum. but I got non-sensible estimates of the parameters,
and got 39 warnings. the inverse of sigma is negative, and the warnings says
that in log(det(sigma)): NaNs produced.
What did I do wrong? Can anybody give me a hint?
#
I am not surprised that you are running into difficulties with this model
estimation, since you are treating a constrained optimization problem as
unconstrained one.  It is not so easy to set constraints on the covariance
matrix (i.e. positive definiteness).  The is the beauty of the EM algorithm
is that not only does it possess the ascent property (i.e. the likelihood is
guaranteed to not decrease) for any starting value, but it also ensures that
the constraints are automatically satisfied.  Therefore, EM would be my
preferred choice.  Why do you not want to try that?  

If you really want to use direct maximization, you could try to impose all
the constraints on the parameters, including positive definiteness of
covariance matrix, and solve the constrained optimization problem.  R has
two packages: "alabama" and "Rsolnp" that can handle nonlinear constraints.

Ravi.


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of roach
Sent: Thursday, October 21, 2010 10:24 PM
To: r-help at r-project.org
Subject: [R] Help: Maximum likelihood estimation


I was trying to reproduce a result in a published journal, and I have come
across some difficulties.
I have the following equation, which is two equations combined together.
http://r.789695.n4.nabble.com/file/n3006584/Screenshot.png 
where
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-1.png 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-2.png 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-3.png 
I[t] is unknown, but have the following distribution
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-4.png 
hence, the probability density function is 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-5.png 
and the likelihood function is 
http://r.789695.n4.nabble.com/file/n3006584/Screenshot-6.png 
It used Kiefer's E-M algorithm to estimate the problem. To simplify, first
assume lamda is known. I multiply the matrix in the probability density
function, and write it in a non-matrix form, and use the function optim() to
estimate the maximum. but I got non-sensible estimates of the parameters,
and got 39 warnings. the inverse of sigma is negative, and the warnings says
that in log(det(sigma)): NaNs produced.
What did I do wrong? Can anybody give me a hint?
#
Actually it is not that difficult to parameterize the covariance matrix 
so that the
optimization is unconstrained. first parameterize the correlation matrix 
and the
standard deviations separately. the std devs can be parameterized as

       sigma_i=exp(x_i)     1<=i<=n

For the correlation matrix parameterize it in terms of its choleski 
decomposition.

this is a lower triangular matrix

    1      0       0
    a_1  a_2    0
    b _1 b_2 b_3
      ....


such that  the norm  of each row is 1

to ensure this  form start with

        1
       y_1  1
       y_2  y_3  1
         ...

  and normalize each row to have norm 1. There are n*(n-1)/2  of these y's.
  together with the n x_i's you have n*(n+1)/2 parameters as you should.
#
I'm not quite familiar with E-M algorithm, but I think what I did was the
first step of the iteration. 
The method used in the original article is as follow:
http://r.789695.n4.nabble.com/file/n3008241/untitled.png 
I gave lamda an initial value, and maximized the likelihood function.
This is the complete chunk of my code after using "alabama" package. 
The first iteration had no problem, but after a few iterations, I again got
warnings and the result is not good.
 Is it possible that it's because of some computational problems? because
there are too many log and exp in the functions? Or is there anything I
missed?


library("alabama")
# n=number of observation
w=seq(0.05,0.9,length.out=n)
# iteration
repeat{
lamda=mean(w)
## -log likelihood function
log.L=function(parm){
alpha0=parm[1]
alpha1=parm[2]
alpha2=parm[3]
beta0=parm[4]
beta1=parm[5]
beta2=parm[6]
beta3=parm[7]
# here sigma is actually sigma inverse
sigma11=parm[8]
sigma12=parm[9]
sigma21=parm[10]
sigma22=parm[11]

u1=-alpha0-alpha1*logp-alpha2*lakes+logq
u21=-beta0-beta1*logq-beta2*s-beta3+logp
u22=-beta0-beta1*logq-beta2*s+logp
expon1=u1^2*sigma11+u1*u21*sigma12+u1*u21*sigma21+u21^2*sigma22
expon2=u1^2*sigma11+u1*u22*sigma12+u1*u22*sigma21+u22^2*sigma22
const=-log(2*pi)+.5*log(sigma11*sigma22-sigma12*sigma21)+log(abs(1-alpha1*beta1))
logf=const+log(lamda*exp(-0.5*expon1)+(1-lamda)*exp(-0.5*expon2))
log.L=-sum(logf)
return(log.L)
}

## estimate with nonlinear constraint
hin=function(parm){
h=rep(NA,1)
h[1]=parm[8]*parm[11]-parm[9]*parm[10]
h[2]=parm[8]
h[3]=parm[11]
h
}

heq=function(parm){
h=rep(NA,1)
h[1]=parm[9]-parm[10]
h
}
max.like=constrOptim.nl(par=c(-0.5,-0.5,-0.5,-0.5,0.02,-0.02,0.02,1.9,-1.1,-1.1,1.9),fn=log.L,
hin=hin,heq=heq)
max.like$par


######
parm=max.like$par
alpha0=parm[1]
alpha1=parm[2]
alpha2=parm[3]
beta0=parm[4]
beta1=parm[5]
beta2=parm[6]
beta3=parm[7]
sigma11=parm[8]
sigma12=parm[9]
sigma21=parm[10]
sigma22=parm[11]
u1=-alpha0-alpha1*logp-alpha2*lakes+logq
u21=-beta0-beta1*logq-beta2*s-beta3+logp
u22=-beta0-beta1*logq-beta2*s+logp
expon1=u1^2*sigma11+u1*u21*sigma12+u1*u21*sigma21+u21^2*sigma22
expon2=u1^2*sigma11+u1*u22*sigma12+u1*u22*sigma21+u22^2*sigma22

h1_log=(-log(2*pi)+0.5*log(sigma11*sigma22-sigma12*sigma21))+(log(abs(1-alpha1*beta1))-0.5*expon1)
h2_log=(-log(2*pi)+0.5*log(sigma11*sigma22-sigma12*sigma21))+(log(abs(1-alpha1*beta1))-0.5*expon2)
w1=w

w=1/(1+(1-lamda)/lamda*exp(h2_log-h1_log))
if(cor(w,w1)>0.999) break
}
1 day later
#
Can you provide a reproducible code?

Ravi.

____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: roach <roachyang at gmail.com>
Date: Saturday, October 23, 2010 4:41 am
Subject: Re: [R] Help: Maximum likelihood estimation
To: r-help at r-project.org
4 days later
#
http://r.789695.n4.nabble.com/file/n3018477/JEC.dta JEC.dta 
This is the data.
http://r.789695.n4.nabble.com/file/n3018477/A_Study_of_Cartel_Stability_The_Joint_Executive_Committee.pdf
A_Study_of_Cartel_Stability_The_Joint_Executive_Committee.pdf 
This is the original paper.
The variable S is breakdown into four dummy variables dm1, dm2, dm3, dm4.
the following is my code.


library(foreign)
jec=read.dta("d:/JEC.dta")

## generate year variables
jec["year"]=1879+ceiling(jec$week/52)

## generate months variables
jec["month"]=ifelse(jec$week%%52==0,13,ceiling((jec$week%%52)/4))

## generate dummy variables
jec["dm1"]=0
jec[jec$week>=28&jec$week<=subset(jec,year==1883&week%%52==10)$week,"dm1"]=1
jec["dm2"]=0
jec[jec$year==1883&jec$week%%52>=11&jec$week%%52<=25,"dm2"]=1
jec["dm3"]=0
jec[jec$week>=subset(jec,year==1883&week%%52==26)$week&jec$week<=subset(jec,year==1886&week%%52==11)$week,"dm3"]=1
jec["dm4"]=0
jec[jec$week>=subset(jec,year==1886&week%%52==12)$week,"dm4"]=1

### E-M algorithm
library("alabama")
n=328
w=seq(0.05,0.9,length.out=n)
repeat{
logp=log(jec$gr)
logq=log(jec$tqg)
lakes=jec$lakes
dm1=jec$dm1
dm2=jec$dm2
dm3=jec$dm3
dm4=jec$dm4
lamda=mean(w)
## -log likelihood function
log.L=function(parm){
alpha0=parm[1]
alpha1=parm[2]
alpha2=parm[3]
beta0=parm[4]
beta1=parm[5]
beta21=parm[6]
beta22=parm[7]
beta23=parm[8]
beta24=parm[9]
beta3=parm[10]
sigma11=parm[11]
sigma12=parm[12]
sigma21=parm[13]
sigma22=parm[14]

u1=-alpha0-alpha1*logp-alpha2*lakes+logq
u21=-beta0-beta1*logq-beta21*dm1-beta22*dm2-beta23*dm3-beta24*dm4-beta3+logp
u22=-beta0-beta1*logq-beta21*dm1-beta22*dm2-beta23*dm3-beta24*dm4+logp
expon1=u1^2*sigma11+u1*u21*sigma12+u1*u21*sigma21+u21^2*sigma22
expon2=u1^2*sigma11+u1*u22*sigma12+u1*u22*sigma21+u22^2*sigma22
const=-log(2*pi)+.5*log(sigma11*sigma22-sigma12*sigma21)+log(abs(1-alpha1*beta1))
logf=const+log(lamda*exp(-0.5*expon1)+(1-lamda)*exp(-0.5*expon2))
log.L=-sum(logf)
return(log.L)
}


####
## estimate with nonlinear constraint
hin=function(parm){
h=rep(NA,1)
h[1]=parm[11]*parm[14]-parm[12]*parm[13]
h[2]=parm[11]
h[3]=parm[14]
h[4]=-parm[12]
h[5]=-parm[13]
h
}

heq=function(parm){
h=rep(NA,1)
h[1]=parm[12]-parm[13]
h
}

max.like=constrOptim.nl(par=c(-0.5,-0.5,-0.5,-0.5,0.02,-0.02,-0.02,-0.02,-0.03,0.02,1.9,-1.1,-1.1,1.9),fn=log.L,
hin=hin,heq=heq)

######
parm=max.like$par
alpha0=parm[1]
alpha1=parm[2]
alpha2=parm[3]
beta0=parm[4]
beta1=parm[5]
beta21=parm[6]
beta22=parm[7]
beta23=parm[8]
beta24=parm[9]
beta3=parm[10]
sigma11=parm[11]
sigma12=parm[12]
sigma21=parm[13]
sigma22=parm[14]
u1=-alpha0-alpha1*logp-alpha2*lakes+logq
u21=-beta0-beta1*logq-beta21*dm1-beta22*dm2-beta23*dm3-beta24*dm4-beta3+logp
u22=-beta0-beta1*logq-beta21*dm1-beta22*dm2-beta23*dm3-beta24*dm4+logp
expon1=u1^2*sigma11+u1*u21*sigma12+u1*u21*sigma21+u21^2*sigma22
expon2=u1^2*sigma11+u1*u22*sigma12+u1*u22*sigma21+u22^2*sigma22

h1_log=(-log(2*pi)+0.5*log(sigma11*sigma22-sigma12*sigma21))+(log(abs(1-alpha1*beta1))-0.5*expon1)
h2_log=(-log(2*pi)+0.5*log(sigma11*sigma22-sigma12*sigma21))+(log(abs(1-alpha1*beta1))-0.5*expon2)
w1=w

w=1/(1+(1-lamda)/lamda*exp(h2_log-h1_log))
if(cor(w,w1)>0.99) break
}