lmer: Model with crossed and nested factors, unbalanced data - R-SIG-mixed-models

Wed, Jul 3, 2013 11:59 AM #
Dear all,

For a research project on climate legislation in the U.S., I am analyzing
data on the votes that Senators cast on several cap-and-trade bills in the
period 2003-2008. For each Senator, we have data about how he or she voted
regarding a certain bill (i.e., 'yea' or 'nay')--given, of course, that
that Senator had a seat in Congress in the year that the bill was voted
upon. We want to explain the voting behavior of these Senators given
characteristics of the Senators and of their constituencies, that is, the
states they represent, but at the same time take into account the nested
structure of the data.

Thus, the data looks as follows:
state    Senator    bill    vote
FL    'Bill Nelson'    'CSA2003'    'yea'
FL    'Bob Graham'    'CSA2003'    'yea'
FL    'Bill Nelson'    'CSA2005'    'yea'
FL    'Mel Martinez'    'CSA2005'    'nay'

(See attachment for a sample of the data.)

One choice to analyze such data seems to be a mixed model with both crossed
and nested random factors. First, Senators are expected to behave
consistently over time: their votes on different bills should be similar.
Second, pairs of Senators represent the same state: for example, in 2003,
Bill Nelson and Bob Graham both represented Florida. So, there seems to be
a random effect of Senators, which are nested in states. Third, there would
be a random effect of bill, which is crossed with states and Senators.
Finally, the model should be logistic, as votes can be either 'yea' or
'nay'.

1. How should I specify such a model? Is it sufficient just to specify both
the nested random effects of Senator and state, as well as the random
effect of bill (in analogy to this post:
http://r.789695.n4.nabble.com/lmer-crossed-random-effects-specification-td831762.html)?
For example, in case of a model with only random intercepts for Senator,
state and bill:

dataSenate <- read.table("sampledata.txt", header = TRUE, sep = "\t",
na.strings = c("-1"))

dataSenate$state <- as.factor(dataSenate$state)
dataSenate$Senator <- as.factor(dataSenate$Senator)
dataSenate$bill <- as.factor(dataSenate$bill)

library(lme4)

interceptonly <- glmer(vote ~ 1 + (1 | state/Senator) + (1 | bill), data =
dataSenate, family=binomial(link = "logit"))

Or should I use the pdBlocked and pdIdent formulation that is suggested
here: http://tolstoy.newcastle.edu.au/R/help/02b/2068.html?

2. This does not seem to be a balanced design: some Senators lost their
seat in the period 2003-2008, so that many of them did not vote upon all
three of the bills. In other words, for many Senator-bill-combinations,
there are no data. Should this affect my interpretation of the results?

Best regards,

Clara Vandeweerdt
Master in Comparative and International Politics, 2013
Faculty of Social Sciences
KU Leuven
Belgium
-------------- next part --------------
state	bill	Senator	vote
WA	CSA2003	Patty Murray	1
WA	CSA2003	Maria Cantwell	1
WA	ACSA2008	Patty Murray	1
WA	ACSA2008	Maria Cantwell	1
WA	CSA2005	Patty Murray	1
WA	CSA2005	Maria Cantwell	1
DE	CSA2003	Joseph Biden	1
DE	CSA2003	Thomas Carper	1
DE	ACSA2008	Joseph Biden	-1
DE	ACSA2008	Thomas Carper	1
DE	CSA2005	Joseph Biden	1
DE	CSA2005	Thomas Carper	1
WI	CSA2003	Herbert Herb Kohl	1
WI	CSA2003	Russell Feingold	1
WI	ACSA2008	Herbert Herb Kohl	1
WI	ACSA2008	Russell Feingold	1
WI	CSA2005	Herbert Herb Kohl	1
WI	CSA2005	Russell Feingold	0
WV	CSA2003	John Jay Rockefeller	1
WV	CSA2003	Robert Byrd	0
WV	ACSA2008	John Jay Rockefeller	1
WV	ACSA2008	Robert Byrd	-1
WV	CSA2005	John Jay Rockefeller	1
WV	CSA2005	Robert Byrd	0
HI	CSA2003	Daniel Akaka	1
HI	CSA2003	Daniel Inouye	1
HI	ACSA2008	Daniel Akaka	1
HI	ACSA2008	Daniel Inouye	1
HI	CSA2005	Daniel Akaka	1
HI	CSA2005	Daniel Inouye	1
FL	CSA2003	Bill Nelson	1
FL	CSA2003	Bob Graham	1
FL	ACSA2008	Bill Nelson	1
FL	ACSA2008	Mel Martinez	1
FL	CSA2005	Bill Nelson	1
FL	CSA2005	Mel Martinez	0