Hello,
# Full disclosure. I am not sure if my problem is a bug(s) in the code, or a
fundamental misunderstanding on my part about what I am trying to do with
these statistics. I am not familiar with maximum likelihood tests.
# I currently have two vectors
Aequipecten<-c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Samples<-c(0.03333333, 0.06666667, 0.25000000, 0.27222222, 0.27777778,
0.28888889, 0.29444444, 0.30000000, 0.31666667, 0.32777778, 0.33333333,
0.33888889, 0.35000000, 0.35555556, 0.36111111, 0.37777778, 0.38333333,
0.38888889, 0.40000000, 0.40555556, 0.41111111, 0.42222222, 0.43333333,
0.44444444, 0.45000000, 0.46111111, 0.46666667, 0.47222222, 0.47777778,
0.49444444, 0.52222222, 0.53888889, 0.55000000, 0.55555556, 0.56111111,
0.56666667, 0.57222222, 0.57777778, 0.58333333, 0.58888889, 0.60000000,
0.60555556, 0.61111111, 0.61666667, 0.62222222, 0.62777778, 0.63333333,
0.63888889, 0.64444444, 0.65000000, 0.65555556, 0.66111111, 0.66666667,
0.67222222, 0.67777778, 0.68333333, 0.68888889, 0.69444444, 0.70000000,
0.70555556, 0.71111111, 0.71666667, 0.72222222, 0.72777778, 0.73333333,
0.73888889, 0.74444444, 0.75000000, 0.75555556, 0.76111111, 0.76666667,
0.77222222, 0.77777778, 0.78333333, 0.78888889, 0.79444444, 0.80000000,
0.81111111, 0.81666667, 0.83888889, 0.85555556, 0.86666667, 0.88333333,
0.88888889, 0.89444444, 0.94444444, 0.95555556)
# What I want to do is a log-likelihood model selection process where i
check if a Gaussian fit or a skewed unimodal fit (among others) is better.
# The first thing I do is use this function written by Oksanen
http://cc.oulu.fi/~jarioksa/softhelp/hof3.pdf
HOF.start<-function(y,x,model=5) {
if (model >=4) {
k<-glm(y ~ x + I(x^2),family=binomial)
k1<-k$coefficients[1]
k2<-k$coefficients[2]
k3<-k$coefficients[3]
u<-(-k2/2/k3)
h<-plogis(k1-k2^2/4/k3)
r<-log(1/h*(-2*h+2*sqrt(h))/2)
b<-5.07-0.227*k3
a<-(-b*u+r)
c<-b*u+4
}
else {
k<-glm(y ~ x,family=binomial)
k1<-k$coefficients[1]
k2<-k$coefficients[2]
a<-(-k1)
b<-(-k2)
c<-0
}
switch(model,
p<-c(a=a),
p<-c(a=a,b=b),
p<-c(a=a,b=b,c=c),
p<-c(a=a,b=b,c=c),
p<-c(a=a,b=b,c=c,d=b),
)
p
}
# What I am concerned about here is option (model=4).
p<-HOF.start(Aequipecten,Samples,model=4)
p
a.I(x^2) b.I(x^2) c.I(x^2)
-6.520338 10.330863 10.547157
# I am not sure if this function is working correctly (but see below for my
alternative way of deriving equivalent p's). The goal is to use these values
"p" as a starting point for the non-linear maximization.
# Next, I plug those values (p) into the non-linear maximization which uses
the following functions:
HOF.fun<-function(p,model,x,M=1) {
p0<-min(x)
ra<-max(x)-p0
x<-(x-p0)/ra
switch(model,
fv<-rep(M/(1+exp(p[1])),length(x)),
fv<-M/(1+exp(p[1]+p[2]*x)),
fv<-M/(1+exp(p[1]+p[2]*x))/(1+exp(p[3])),
fv<-M/(1+exp(p[1]+p[2]*x))/(1+exp(p[3]-p[2]*x)),
fv<-M/(1+exp(p[1]+p[2]*x))/(1+exp(p[3]-p[4]*x))
)
fv
}
HOF<-function(x,y,p,model,M=1) {
fv<-HOF.fun(p,model,x,M=M)
sum(-dbinom(y,M,fv,log=TRUE))
}
# My understanding of how this is supposed to work is that the final line of
the HOF function:
"sum(-dbinom(y,M,fv,log=TRUE))" is supposed to give the negative log-odds of
that function being a good fit.
HOF(p,x=Sample,y=Aequipecten,model=4)
[1] 63.38763
# Notice, however, that it returns a positive value, when it is supposed to
be negative. What does this mean? What am I doing wrong?
#Furthermore, if I attempt a non-linear maximization on this function, I
continue to get a positive negative log-odds and what I think are
unreasonable looking parameter estimates. What does this mean?
sol<-nlm(HOF,p,x=Sample,y=Aequipecten,model=4)
sol
$minimum
[1] 32.72701
$estimate
[1] -8.751336 12.687632 8.281556
$gradient
[1] -4.640475e-07 -5.290583e-07 5.618925e-08
$code
[1] 1
$iterations
[1] 16
# If I compare these results to simply doing a straight gaussian logistic
regression on the vectors I get quite a different result. I'm very confident
that this equation is working properly. I would expect the outputs of this
function to be roughly equivalent to the results from object "p" as seen
above, but they don't look similar at all. I'm not sure what this means.
tBLparamsForOneSpecies <- function(theData) {
x <- theData[ ,1]
y <- theData[ ,2]
logitReg <- glm(y ~ x + I(x^2), family=binomial)
b0 <- logitReg$coefficients[1]
b1 <- logitReg$coefficients[2]
b2 <- logitReg$coefficients[3]
opt <- (-b1)/(2*b2)
tol <- 1 / sqrt(-2*b2)
pmax<-1/(1+exp(b1^2/(4*b2)-b0))
theParams <- cbind(opt, tol, pmax)
theParams
}
Call: glm(formula = Aequipecten ~ Sample + I(Sample^2), family = binomial)
Coefficients:
(Intercept) Sample I(Sample^2)
-10.44 29.37 -23.18
Degrees of Freedom: 86 Total (i.e. Null); 84 Residual
Null Deviance: 73.38
Residual Deviance: 68.07 AIC: 74.07
# So anyway, to sum up my problem, I'm concerned that the results of my
function HOF and of my non-linear maximization of HOF keep giving me a
positive log-odds. I'm not sure if this is a result of a bug in the code
somewhere, or if I actually just don't understand what I'm doing in terms of
the statistics. I've tried to provide as much information as possible, but I
really don't know where or what the problem is.
Thank you for any help,
Andrew
--
View this message in context: http://r.789695.n4.nabble.com/Non-linear-maximization-function-in-R-tp3916240p3916240.html
Sent from the R help mailing list archive at Nabble.com.
# My understanding of how this is supposed to work is that the final line of
the HOF function:
"sum(-dbinom(y,M,fv,log=TRUE))" is supposed to give the negative log-odds of
that function being a good fit.
HOF(p,x=Sample,y=Aequipecten,model=4)
[1] 63.38763
# Notice, however, that it returns a positive value, when it is supposed to
be negative. What does this mean? What am I doing wrong?
Not reading up on the theory.
a: That looks like a negative log likelihood. "Log odds of being a good fit" makes no sense (to me at least).
b: Likelihoods for discrete data are probabilities, hence the log likelihood is negative and the negative log likelihood is positive.
c: You usually want to maximize the likelihood, hence minimize the negative log likelihood.
d: Your "gaussian logistic regression" below, isn't. It's a binary logistic regression, basically doing what HOF.start does, but calculating a different set of transformed parameters.
e: The names on the return value from HOF.start doesn't seem to match the function definition???
#Furthermore, if I attempt a non-linear maximization on this function, I
continue to get a positive negative log-odds and what I think are
unreasonable looking parameter estimates. What does this mean?
sol<-nlm(HOF,p,x=Sample,y=Aequipecten,model=4)
sol
$minimum
[1] 32.72701
$estimate
[1] -8.751336 12.687632 8.281556
$gradient
[1] -4.640475e-07 -5.290583e-07 5.618925e-08
$code
[1] 1
$iterations
[1] 16
# If I compare these results to simply doing a straight gaussian logistic
regression on the vectors I get quite a different result. I'm very confident
that this equation is working properly. I would expect the outputs of this
function to be roughly equivalent to the results from object "p" as seen
above, but they don't look similar at all. I'm not sure what this means.
tBLparamsForOneSpecies <- function(theData) {
x <- theData[ ,1]
y <- theData[ ,2]
logitReg <- glm(y ~ x + I(x^2), family=binomial)
b0 <- logitReg$coefficients[1]
b1 <- logitReg$coefficients[2]
b2 <- logitReg$coefficients[3]
opt <- (-b1)/(2*b2)
tol <- 1 / sqrt(-2*b2)
pmax<-1/(1+exp(b1^2/(4*b2)-b0))
theParams <- cbind(opt, tol, pmax)
theParams
}
Call: glm(formula = Aequipecten ~ Sample + I(Sample^2), family = binomial)
Coefficients:
(Intercept) Sample I(Sample^2)
-10.44 29.37 -23.18
Degrees of Freedom: 86 Total (i.e. Null); 84 Residual
Null Deviance: 73.38
Residual Deviance: 68.07 AIC: 74.07
# So anyway, to sum up my problem, I'm concerned that the results of my
function HOF and of my non-linear maximization of HOF keep giving me a
positive log-odds. I'm not sure if this is a result of a bug in the code
somewhere, or if I actually just don't understand what I'm doing in terms of
the statistics. I've tried to provide as much information as possible, but I
really don't know where or what the problem is.
Thank you for any help,
Andrew
--
View this message in context: http://r.789695.n4.nabble.com/Non-linear-maximization-function-in-R-tp3916240p3916240.html
Sent from the R help mailing list archive at Nabble.com.
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com