4 binary DVs, subjects nested within schools
You've just described a classic market research problem and method. It's called choice modes. They used to be modelled using aggregate multinomial logit models. But these days they are more commonly modelled using Bayesian multinomial logit, this can allow us to get individual level parameters and since a lot of the variance is at the individual level we model it that way. Sawtooth software are experts on this. You'll find all types of good reference material on their web site. Plus they have a Bayesian software for multinomial logit. Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax / office) chris at trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement.
On 23/11/2011, at 4:25, Paul Johnson <pauljohn32 at gmail.com> wrote:
Greetings I'm trying to get my footing under a researcher's request for statistical support. I need your advice. The gist of this is that there are 4 dichotomous outputs that can be modeled separately with logistic or probit models, and lme4 works fine treating each one separately. There is a random effect at the school level. However, a reviewer says a multivariate model is needed to fully model this problem. The data is like selections from a menu, where all of the above is possible. This actual project is about student behaviors in the class room, but it seems more understandable to me to think of it as a person's taste for ice cream. Respondents are asked "do you like chocolate ice cream" or "do you like vanilla ice cream" or "strawberry ice cream". So the dependent variable is multivariate like this (yes, no, yes, no). Where can I learn more about the multivariate approach to this? And why are multivariate approaches not making the same mistake that is described in this literature on comparison of coefficients across logit models fitted for separate groups. I mean, if the variance parameter is not identified, how can I meaningfully put together 4 logit models? Allison, Paul. 1999. ?Comparing Logit and Probit Coefficients Across Groups.? Sociological Methods and Research 28(2): 186-208 Richard Williams, 2008, "Using Heterogeneous Choice Models To Compare Logit and Probit Coefficients Across Groups" http://nd.edu/~rwilliam/oglm/RW_Hetero_Choice.pdf Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67 -82. doi:10.1093/esr/jcp006 Well, anyway, this looks like a project to me. I (probably) first need to understand how to fit this model without any distractions due to nested effects or sampling weights, and then I need to take into account the fact that students are nested in classrooms. I've been digging about for models of more-than-one dichotomy. VGAM has bivariate logit and probit. The brand new package mvProbit has "experimental" support for several dichotomous DVs. But I don't think it is going to help with the classroom random effect. I'm trying to find the simplest way to write all this down as a model so I can see where the correlations come in across questions and across units. For each outcome, yj, j=1,2,3,4, there is a coefficient vector Bj and an error term ej and the model states: y1 = 1 if XB1 + e1 > 0; 0 otherwise y2 = 1 if XB2 + e2 > 0; 0 otherwise y3 = 1 if XB3 + e3 > 0; 0 otherwise y4 = 1 if XB4 + e4 > 0; 0 otherwise Suppose (e1,e2,e3,e4) is multivariate (normal or logistic?). Because of the "you can't compare logistic regressions across groups" problem, it appears problematic to assert that the variances of ej = 1. Pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models