Skip to content

4 binary DVs, subjects nested within schools

7 messages · Chris Howden, John Maindonald, Paul Johnson +1 more

#
Greetings

I'm trying to get my footing under a researcher's request for
statistical support. I need your advice.

The gist of this is that there are 4 dichotomous outputs that can be
modeled separately with logistic or probit models, and lme4 works fine
treating each one separately.  There is a random effect at the school
level.

However, a reviewer says a multivariate model is needed to fully model
this problem.

The data is like selections from a menu, where all of the above is
possible.  This actual project is about student behaviors in the class
room, but it seems more understandable to me to think of it as a
person's taste for ice cream. Respondents are asked "do you like
chocolate ice cream" or "do you like vanilla ice cream" or "strawberry
ice cream".  So the dependent variable is multivariate like this (yes,
no, yes, no).

Where can I learn more about the multivariate approach to this?

And why are multivariate approaches not making the same mistake that
is described in this literature on comparison of coefficients across
logit models fitted for separate groups. I mean, if the variance
parameter is not identified, how can I meaningfully put together 4
logit models?

Allison, Paul. 1999. ?Comparing Logit and Probit Coefficients Across
Groups.? Sociological Methods and Research 28(2): 186-208

Richard Williams, 2008, "Using Heterogeneous Choice Models To Compare
Logit and Probit Coefficients Across Groups"
http://nd.edu/~rwilliam/oglm/RW_Hetero_Choice.pdf

Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think
We Can Do, and What We Can Do About It. European Sociological Review,
26(1), 67 -82. doi:10.1093/esr/jcp006

Well, anyway, this looks like a project to me.  I (probably) first
need to understand how to fit this model without any distractions due
to nested effects or sampling weights, and then I need to take into
account the fact that students are nested in classrooms.

I've been digging about for models of more-than-one dichotomy.  VGAM
has bivariate logit and probit.   The brand new package mvProbit has
"experimental" support for several dichotomous DVs.   But I don't
think it is going to help with the classroom random effect.

I'm trying to find the simplest way to write all this down as a model
so I can see where the correlations come in across questions and
across units. For each outcome,  yj, j=1,2,3,4, there is a coefficient
vector Bj and an error term ej and the model states:

y1 = 1 if XB1 + e1 > 0; 0 otherwise
y2 = 1 if XB2 + e2 > 0; 0 otherwise
y3 = 1 if XB3 + e3 > 0; 0 otherwise
y4 = 1 if XB4 + e4 > 0; 0 otherwise

Suppose (e1,e2,e3,e4) is multivariate (normal or logistic?).  Because
of the "you can't compare logistic regressions across groups" problem,
it appears problematic to assert that the variances of ej = 1.

Pj
#
You've just described a classic market research problem and method.
It's called choice modes.

They used to be modelled using aggregate multinomial logit models.

But these days they are more commonly modelled using Bayesian
multinomial logit, this can allow us to get individual level
parameters and since a lot of the variance is at the individual level
we model it that way.

Sawtooth software are experts on this. You'll find all types of good
reference material on their web site. Plus they have a Bayesian
software for multinomial logit.

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are not
the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy,
use or disclose this communication or any attachments without our
consent. Although this email has been checked by anti-virus software,
there is a risk that email messages may be corrupted or infected by
viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the
company. Tricky Solutions always does our best to provide accurate
forecasts and analyses based on the data supplied, however it is
possible that some important predictors were not included in the data
sent to us. Information provided by us should not be solely relied
upon when making decisions and clients should use their own judgement.
On 23/11/2011, at 4:25, Paul Johnson <pauljohn32 at gmail.com> wrote:

            
#
NB also R's mlogit package, which has an accompanying vignette that includes
a number of worked examples, with R code.
Cheers
John Maindonald.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 23/11/2011, at 11:21 AM, Chris Howden wrote:

            
#
On Tue, Nov 22, 2011 at 7:57 PM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
Dear John and Chris:

I need to make sure I understand your suggestion here. With 4 Yes or
No options, subject is free to pick any combination. That leads to a
multinomial model with 16 possible 4 tuples as outcomes:

(N,N,N,N)
(N,N,N,Y)
(N,N,Y,N)
(N,N,Y,Y)
(N,Y,N,N)
...and so forth
(Y,Y,Y,Y)

I've never tried fitting a multinomial with more than a few different
outcomes.  But I'm up for the challenge if that's what you are
actually suggesting.
#
Looking more carefully at your message and at what Chris has said, I doubt that 
mlogit's methods are appropriate.  I'd read "choice mode" as "choice model", 
maybe wrongly. 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 23/11/2011, at 5:21 PM, Paul Johnson wrote:

            
#
I believe that's the idea yes.

I'm not sure exactly how u would set it up in R though. There are ways to
do so but I haven't used them (although I keep meaning to). I believe
clogit can fit a conditional logit model which can be made to act like a
multinomial mlogit model?

These references may help

Aizaki H (2009). \Development of an Application Program for the Design and
Analysis of
Choice Experiments with R (in Japanese with English summary)." Kodo
Keiryogaku, 36(1),
35{46. URL http://www.jstage.jst.go.jp/article/jbhmk/36/1/36_35/_article.

Aizaki H (2011). support.CEs: Basic functions for supporting an
implementation of choice
experiments, R Package Version 0.2-0. URL
http://cran.r-project.org/package=
support.CEs.

Aizaki H, Nishimura K (2008). \Design and Analysis of Choice Experiments
Using R: A
Brief Introduction." Agricultural Information Research, 17(2), 86{94. URL
http://www.
jstage.jst.go.jp/article/air/17/2/17_86/_article.



Chris Howden B.Sc. (Hons) GStat.
Founding Partner
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling and Training
(mobile) 0410 689 945
(fax) +612 4782 9023
chris at trickysolutions.com.au




Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information.?If you are
not the named or intended recipient, please delete this communication and
contact us immediately.?Please note you are not authorised to copy, use or
disclose this communication or any attachments without our consent.
Although this email has been checked by anti-virus software, there is a
risk that email messages may be corrupted or infected by viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the company.
Tricky Solutions always does our best to provide accurate forecasts and
analyses based on the data supplied, however it is possible that some
important predictors were not included in the data sent to us. Information
provided by us should not be solely relied upon when making decisions and
clients should use their own judgement.


-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Paul
Johnson
Sent: Wednesday, 23 November 2011 3:21 PM
To: R-SIG-Mixed-Models at r-project.org
Subject: Re: [R-sig-ME] 4 binary DVs, subjects nested within schools

On Tue, Nov 22, 2011 at 7:57 PM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
includes
Dear John and Chris:

I need to make sure I understand your suggestion here. With 4 Yes or
No options, subject is free to pick any combination. That leads to a
multinomial model with 16 possible 4 tuples as outcomes:

(N,N,N,N)
(N,N,N,Y)
(N,N,Y,N)
(N,N,Y,Y)
(N,Y,N,N)
...and so forth
(Y,Y,Y,Y)

I've never tried fitting a multinomial with more than a few different
outcomes.  But I'm up for the challenge if that's what you are
actually suggesting.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
On 23/11/2011, at 4:25, Paul Johnson <pauljohn32 at gmail.com> wrote:

            
You _could_ look at openMX, which runs under R, but is not yet in CRAN
because of negotiations about the licensing for NPSOL, 
if I understand correctly.

http://openmx.psyc.virginia.edu/
Mx would fit a multivariate probit mixed model.

If the schools are big enough, you might take the other older approach of
calculating tetrachoric correlation matrices and using a SEM package, such
as "sem".
The multivariate probit doesn't have that problem, because it has to
ignore it ;) If you think of the threshold formulation (as you usually
do once you have more than two ordinal categories), you get the
tetrachoric correlation as the measure of association between your
variables.  You can get easily get models where the correlation matrix
is the same for the different groups, even though the item endorsement
rates (or prevalences) for the items are different.  With only one
threshold, we handwave and say that the underlying latent variables are
the same, but the thresholds have moved.  With two or more thresholds,
changes in variance can appear as the thresholds moving closer together
or further apart.

One test of the appropriateness of the model is if you have
three or more DVs, then for any three, you can fit a one factor factor
analytic model to the tetrachoric correlation matrix, which should give a
perfect fit to the observed 2x2x2 contingency table (there is a paper by
Muthen) - this has low power.

But if you think about it, it is not that different to the question of
why we usually assume that random effects are normally distributed.

Anyway, I have rambled long enough.