Hi all, is there any package which can do an EM algorithm fitting of logistic regression coefficients given only the explanatory variables? I tried to realize this using the Design package, but I didn't find a way. Thanks a lot & Kind regards Robin Aly
Logistic Regression Fitting with EM-Algorithm
4 messages · Robin Aly, (Ted Harding)
On 03-Jan-11 14:02:21, Robin Aly wrote:
Hi all, is there any package which can do an EM algorithm fitting of logistic regression coefficients given only the explanatory variables? I tried to realize this using the Design package, but I didn't find a way. Thanks a lot & Kind regards Robin Aly
As written, this is a strange question! You imply that you do not have data on the response (0/1) variable at all, only on the explanatory variables. In that case there is no possible estimate, because that would require data on at least some of the values of the response variable. I think you should explain more clearly and explicitly what the information is that you have for all the variables. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 03-Jan-11 Time: 23:36:56 ------------------------------ XFMail ------------------------------
6 days later
Dear Ted,
sorry for being unclear. Let me try again.
I indeed have no knowledge about the value of the response variable for
any object.
Instead, I have a data frames of explanatory variables for
each object. For example,
x1 x2 x3
1 4.409974 2.348745 1.9845313
2 3.809249 2.281260 1.9170466
3 4.229544 2.610347 0.9127431
4 4.259644 1.866025 1.5982859
5 4.001306 2.225069 1.2551570
...
, and I want to model a regression model of the form y ~ x1 + x2 + x3.
From prior information I know that all coefficients are approximately
Gaussian distributed around one and the same for the intercept around
-10. Now I think there must be a package which estimates the
coefficients more precisely by fitting the logistic regression function
to the data without knowledge of the response variable (similar to
fitting Gaussians in a mixture model where the class labels are unknown).
I looked at the flexmix package but this seems to "only" find
dependencies in the data assuming the presence of some training data.
I also found some evidence In Magder1997 (see below) that such an
algorithm exists, however from the documented math I can't apply the
method to my problem.
Thanks in advance,
Best Regards
Robin
Magder, L. S. & Hughes, J. P. Logistic Regression When the Outcome Is
Measured with Uncertainty American Journal of Epidemiology, 1997, 146,
195-203
On 01/04/2011 12:36 AM, (Ted Harding) wrote:
On 03-Jan-11 14:02:21, Robin Aly wrote:
Hi all, is there any package which can do an EM algorithm fitting of logistic regression coefficients given only the explanatory variables? I tried to realize this using the Design package, but I didn't find a way. Thanks a lot& Kind regards Robin Aly
As written, this is a strange question! You imply that you do not have data on the response (0/1) variable at all, only on the explanatory variables. In that case there is no possible estimate, because that would require data on at least some of the values of the response variable. I think you should explain more clearly and explicitly what the information is that you have for all the variables. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding)<ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 03-Jan-11 Time: 23:36:56 ------------------------------ XFMail ------------------------------
In view of your further explanation, Robin, the best I can offer is the following. [1] Theoretical frame. *IF* variables (X1,X2,X3) are distributed according to a mixture of two multivariate normal distributions, i.e. as two groups, each with a multivariate normal distribution, *AND* the members of one group are labelled "Y=0" and the members of the other group are labelled "Y=1", *THEN* for a unit chosen at random from the two groups (pooled) the probability that Y=1 conditional on (X1=x1,X2=x2,X3=x3) follows a logistic regression. This regression will be linear in (x1,x2,x3) if the two multivariate normals have the same covariance matrix; it will be quadratic if the two covariance matrices are different. The coefficients in the regression will be algebraic expressions involving these parameters of the two multivariates normals, together with the two proportions p1 and p2 of the two groups. This result is a straightforward algebraic consequence of applying Bayes's Theorem. [2] Practical application If you can identify that the data on (X1,X2,X3) correspond to a mixture of two multivariate normal distributions whose parameters (two multivariate mean vectors, one or two covariance matrices, proportions in the two groups) you can estimate, *AND* *IF* you are justified in assuming that the *unobserved* response variable Y takes the value 0 for one group and 1 for the other, *THEN* you can apply logistic regression to the results (but you will not learn anything by doing so that was not already available from the estimated parameters, and the algebraic expression of the logistic coefficients, as found in [1] above). [3] Caveat Being able to perform the identification and estimation of the two multivariate normals as in [2], by using some mixture identification method, does *NOT* of itself justify making the assumption in [2] that the unobserved response variable Y takes values 0 and 1 according to group membership *UNLESS* that is what you precisely mean by "Y" (i.e. index of group membership in one or other of two multikvariate normals). If the meaning of variable "Y" is different, then success with a mixture algorithm may have nothing to do with what the values of Y are likely to be. [4] Comment Many algorithms for identifying mixtures are based on the EM algorithm. Your additional "prior information" about how the coefficients are distributed could be incorporated into the EM algorithm, but I can't think explicitly of an R function which would enable this (though the MCMC methods associated with BRugs -- the R interface to OpenBUGS -- may allow you to set this up). Probably others can offer more help on this aspect of the matter. I think it is necessary to be absolutely clear about what your model represents! Hoping this helps, Ted.
On 10-Jan-11 20:08:09, Robin Aly wrote:
Dear Ted,
sorry for being unclear. Let me try again.
I indeed have no knowledge about the value of the response
variable for any object.
Instead, I have a data frames of explanatory variables for
each object. For example,
x1 x2 x3
1 4.409974 2.348745 1.9845313
2 3.809249 2.281260 1.9170466
3 4.229544 2.610347 0.9127431
4 4.259644 1.866025 1.5982859
5 4.001306 2.225069 1.2551570
...
, and I want to model a regression model of the form
y ~ x1 + x2 + x3.
From prior information I know that all coefficients are
approximately Gaussian distributed around one and the same
for the intercept around -10. Now I think there must be a
package which estimates the coefficients more precisely by
fitting the logistic regression function to the data without
knowledge of the response variable (similar to fitting
Gaussians in a mixture model where the class labels are
unknown).
I looked at the flexmix package but this seems to "only"
find dependencies in the data assuming the presence of some
training data.
I also found some evidence In Magder1997 (see below) that
such an algorithm exists, however from the documented math
I can't apply the method to my problem.
Thanks in advance,
Best Regards
Robin
Magder, L. S. & Hughes, J. P. Logistic Regression When the Outcome Is
Measured with Uncertainty American Journal of Epidemiology, 1997, 146,
195-203
On 01/04/2011 12:36 AM, (Ted Harding) wrote:
On 03-Jan-11 14:02:21, Robin Aly wrote:
Hi all, is there any package which can do an EM algorithm fitting of logistic regression coefficients given only the explanatory variables? I tried to realize this using the Design package, but I didn't find a way. Thanks a lot& Kind regards Robin Aly
As written, this is a strange question! You imply that you do not have data on the response (0/1) variable at all, only on the explanatory variables. In that case there is no possible estimate, because that would require data on at least some of the values of the response variable. I think you should explain more clearly and explicitly what the information is that you have for all the variables. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding)<ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 03-Jan-11 Time: 23:36:56 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 10-Jan-11 Time: 23:52:18 ------------------------------ XFMail ------------------------------