Logistic Regression Fitting with EM-Algorithm

4 messages · Robin Aly, (Ted Harding)

Original

1

4

Robin Aly

Mon, Jan 3, 2011 6:02 AM #

Hi all,

is there any package which can do an EM algorithm fitting of
logistic regression coefficients given only the explanatory
variables? I tried to realize this using the Design package,
but I didn't find a way.

Thanks a lot & Kind regards
Robin Aly

Mon, Jan 3, 2011 3:36 PM #

On 03-Jan-11 14:02:21, Robin Aly wrote:

As written, this is a strange question! You imply that you
do not have data on the response (0/1) variable at all,
only on the explanatory variables. In that case there is
no possible estimate, because that would require data on
at least some of the values of the response variable.

I think you should explain more clearly and explicitly what
the information is that you have for all the variables.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jan-11                                       Time: 23:36:56
------------------------------ XFMail ------------------------------

6 days later

Robin Aly

Mon, Jan 10, 2011 12:08 PM #

Dear Ted,

sorry for being unclear. Let me try again.

I indeed have no knowledge about the value of the response variable for 
any object.
Instead, I have a data frames of explanatory variables for
each object. For example,

     x1       x2       x3
1   4.409974 2.348745 1.9845313
2   3.809249 2.281260 1.9170466
3   4.229544 2.610347 0.9127431
4   4.259644 1.866025 1.5982859
5   4.001306 2.225069 1.2551570
...

, and I want to model a regression model of the form y ~ x1 + x2 + x3.

 From prior information I know that all coefficients are approximately 
Gaussian distributed around one and the same for the intercept around 
-10. Now I think there must be a package which estimates the 
coefficients more precisely by fitting the logistic regression function 
to the data without knowledge of the response variable (similar to 
fitting Gaussians in a mixture model where the class labels are unknown).

I looked at the flexmix package but this seems to "only" find 
dependencies in the data assuming the presence of some training data.
I also found some evidence In Magder1997 (see below) that such an 
algorithm exists, however from the documented math I can't apply the 
method to my problem.

Thanks in advance,
Best Regards
Robin

Magder, L. S. & Hughes, J. P. Logistic Regression When the Outcome Is 
Measured with Uncertainty American Journal of Epidemiology, 1997, 146, 
195-203

On 01/04/2011 12:36 AM, (Ted Harding) wrote:

Mon, Jan 10, 2011 3:52 PM #

In view of your further explanation, Robin, the best I can offer
is the following.

[1] Theoretical frame.
*IF* variables (X1,X2,X3) are distributed according to a
mixture of two multivariate normal distributions, i.e. as
two groups, each with a multivariate normal distribution,
*AND* the members of one group are labelled "Y=0" and the
members of the other group are labelled "Y=1", *THEN* for
a unit chosen at random from the two groups (pooled) the
probability that Y=1 conditional on (X1=x1,X2=x2,X3=x3)
follows a logistic regression. This regression will be
linear in (x1,x2,x3) if the two multivariate normals have
the same covariance matrix; it will be quadratic if the
two covariance matrices are different. The coefficients
in the regression will be algebraic expressions involving
these parameters of the two multivariates normals, together
with the two proportions p1 and p2 of the two groups.

This result is a straightforward algebraic consequence of
applying Bayes's Theorem.

[2] Practical application
If you can identify that the data on (X1,X2,X3) correspond
to a mixture of two multivariate normal distributions whose
parameters (two multivariate mean vectors, one or two
covariance matrices, proportions in the two groups) you can
estimate, *AND* *IF* you are justified in assuming that the
*unobserved* response variable Y takes the value 0 for one
group and 1 for the other, *THEN* you can apply logistic
regression to the results (but you will not learn anything by
doing so that was not already available from the estimated
parameters, and the algebraic expression of the logistic
coefficients, as found in [1] above).

[3] Caveat
Being able to perform the identification and estimation of
the two multivariate normals as in [2], by using some mixture
identification method, does *NOT* of itself justify making
the assumption in [2] that the unobserved response variable
Y takes values 0 and 1 according to group membership *UNLESS*
that is what you precisely mean by "Y" (i.e. index of group
membership in one or other of two multikvariate normals).
If the meaning of variable "Y" is different, then success with
a mixture algorithm may have nothing to do with what the values
of Y are likely to be.

[4] Comment
Many algorithms for identifying mixtures are based on the
EM algorithm. Your additional "prior information" about how
the coefficients are distributed could be incorporated into
the EM algorithm, but I can't think explicitly of an R function
which would enable this (though the MCMC methods associated
with BRugs -- the R interface to OpenBUGS -- may allow you to
set this up). Probably others can offer more help on this aspect
of the matter.

I think it is necessary to be absolutely clear about what
your model represents!

Hoping this helps,
Ted.

On 10-Jan-11 20:08:09, Robin Aly wrote:

Dear Ted,

sorry for being unclear. Let me try again.

I indeed have no knowledge about the value of the response
variable for any object.
Instead, I have a data frames of explanatory variables for
each object. For example,

     x1       x2       x3
1   4.409974 2.348745 1.9845313
2   3.809249 2.281260 1.9170466
3   4.229544 2.610347 0.9127431
4   4.259644 1.866025 1.5982859
5   4.001306 2.225069 1.2551570
...

, and I want to model a regression model of the form
 y ~ x1 + x2 + x3.

From prior information I know that all coefficients are
approximately Gaussian distributed around one and the same
for the intercept around -10. Now I think there must be a
package which estimates the coefficients more precisely by
fitting the logistic regression function to the data without
knowledge of the response variable (similar to fitting
Gaussians in a mixture model where the class labels are
unknown).

I looked at the flexmix package but this seems to "only"
find  dependencies in the data assuming the presence of some
training data.
I also found some evidence In Magder1997 (see below) that
such an algorithm exists, however from the documented math
I can't apply the method to my problem.

Thanks in advance,
Best Regards
Robin

Magder, L. S. & Hughes, J. P. Logistic Regression When the Outcome Is 
Measured with Uncertainty American Journal of Epidemiology, 1997, 146, 
195-203




On 01/04/2011 12:36 AM, (Ted Harding) wrote:

On 03-Jan-11 14:02:21, Robin Aly wrote:

Hi all,
is there any package which can do an EM algorithm fitting of
logistic regression coefficients given only the explanatory
variables? I tried to realize this using the Design package,
but I didn't find a way.

Thanks a lot&  Kind regards
Robin Aly

As written, this is a strange question! You imply that you
do not have data on the response (0/1) variable at all,
only on the explanatory variables. In that case there is
no possible estimate, because that would require data on
at least some of the values of the response variable.

I think you should explain more clearly and explicitly what
the information is that you have for all the variables.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding)<ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jan-11                                       Time: 23:36:56
------------------------------ XFMail ------------------------------

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Jan-11                                       Time: 23:52:18
------------------------------ XFMail ------------------------------