How to deal with outcomes assessed by raters?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130416/44919dfd/attachment.pl>
I'm no expert, but I believe that with only 4 judges that's not enough to
get an accurate estimate of the variability associated with judges.

So U may be better to include them as fixed effects with 4 levels.

That said if U just want a random intercept for each judge and don't want
an accurate measure of their variance it may still be OK to include them
as a random effect? But I'm a little unclear on this point myself.

Chris Howden B.Sc. (Hons) GStat.
Founding Partner
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling and Training
(mobile) 0410 689 945
(fax) +612 4782 9023
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information.?If you are
not the named or intended recipient, please delete this communication and
contact us immediately.?Please note you are not authorised to copy, use or
disclose this communication or any attachments without our consent.
Although this email has been checked by anti-virus software, there is a
risk that email messages may be corrupted or infected by viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the company.
Tricky Solutions always does our best to provide accurate forecasts and
analyses based on the data supplied, however it is possible that some
important predictors were not included in the data sent to us. Information
provided by us should not be solely relied upon when making decisions and
clients should use their own judgement.

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Joseph
Bulbulia
Sent: Tuesday, 16 April 2013 3:07 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] How to deal with outcomes assessed by raters?

Hi all,

Id like to model emotional dynamics in a highly arousing firewalk ritual.

Four judges rated images from 42 participants for arousal and valence. The
predictor variables are ritual phase and role.

Question 1
Any thoughts about how best to handle the rater assessments?

Specifically, is it nuts to explicitly include a component for raters in
the random component of the model?

E.g.

library(MCMCglmm)
prior.fw.0 = list(
  B = list(mu=rep(0,4),V = diag(4)*1e+10),
  R = list(V =diag(2), fix = 1),
  G = list(G1 = list(V = diag(2), n = 2, alpha.mu = c(0,0), alpha.V =
diag(2)*1000),
           G2 = list (V = diag(2),n = 2, alpha.mu = c(0,0), alpha.V =
diag(2)*1000),
           G3 = list (V = diag(2),  fix=1)))

firemodel.test <-MCMCglmm(cbind(arousal, valence) ~ trait:role:trait:phase
-1,
                          random = ~us(trait):phase:id
                          + idh(trait):event:id
                          + idh(trait):rater,
                          rcov= ~ idh(trait):units,
                          family = rep("ordinal",2),
                          data=Firewalkdata, burnin=5000,
                          thin = 10,
                          nitt=20000,
                          prior=prior.fw.0)

Thanks everyone. Very grateful for and advice.

Joseph

Disclaimer
I'm new to GLMMs,so apologies if this doesn't make sense.

Data sample below
(Only 20 data points, just to get a sense of the structure)

Firewalkdata <- structure(list(obs = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"20"), id = structure(c(24L, 4L, 26L, 37L, 32L, 3L, 20L, 9L, 3L, 2L, 5L,
19L, 23L, 28L, 29L, 8L, 3L, 18L, 40L, 26L), .Label = c("a", "b", "c", "d",
"e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "pa", "pb",
"pc", "pd", "pe", "pf", "pg", "ph", "pi", "pj", "pk", "pl", "pm", "pn",
"po", "pp", "q", "r", "s", "t", "u", "v", "w", "x", "y"), class =
"factor"), phase = c(1, 2, 5, 5, 2, 3, 1, 4, 5, 2, 5, 1, 5, 4, 3, 4, 5, 3,
2, 4), event = structure(c(11L, 4L, 13L, 21L, 22L, 3L, 4L, 9L, 3L, 2L, 5L,
3L, 9L, 16L, 17L, 8L, 3L, 2L, 24L, 13L), .Label = c("a", "b", "c", "d",
"e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s",
"t", "u", "v", "w", "x", "y"), class = "factor"), role = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("FW", "PS"), class = "factor"), dyad = structure(c(2L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L),
.Label = c("n", "y"), class = "factor"), gender = structure(c(1L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L),
.Label = c("f", "m"), class = "factor"), rater = structure(c(4L, 1L, 3L,
1L, 3L, 4L, 1L, 2L, 4L, 3L, 4L, 3L, 4L, 1L, 1L, 4L, 4L, 1L, 4L, 4L),
.Label = c("rat1", "rat2", "rat3", "rat4"), class = "factor"),
    arousal = c(4, 5, NA, 6, 6, 6, 4, 4, 7, 4, 3, 7, 5, 5, 5,
    6, 5, NA, 4, 6), valence = c(4, 3, NA, 2, 6, 1, 2, 6, 3,
    5, 5, 7, 5, 2, 2, 1, 4, NA, 4, 6)), .Names = c("obs", "id", "phase",
"event", "role", "dyad", "gender", "rater", "arousal", "valence"),
row.names = c(3977L, 83L, 2996L, 525L, 3134L, 3213L, 726L, 1267L, 3221L,
2134L, 3273L, 2801L, 3975L, 944L, 964L, 3344L, 3223L, 688L, 3758L, 4041L),
class = "data.frame")

IMAGE SAMPLE (for the curious)
https://www.dropbox.com/s/xmbci5814h73i0l/4_MF_e5.png

GRAPH bootstrapped means
https://www.dropbox.com/s/509sgh5zqxq18tn/Figure_FireWalk.pdf

Crude overview of the design
https://www.dropbox.com/s/0o0a9kkrh5ttsd2/plot.plan.emotions_firewalk.pdf

Joseph Bulbulia
Senior Lecturer, Religious Studies
Faculty of Humanities and Social Sciences Victoria University, New Zealand
+64 21 95 94 23
http://www.metaphysicalclub.com

Hi all,

I?d like to model emotional dynamics in a highly arousing firewalk ritual.

Four judges rated images from 42 participants for arousal and valence. The predictor variables are ritual ?phase? and ?role.?

Question 1
Any thoughts about how best to handle the rater assessments?

Specifically, is it nuts to explicitly include a component for raters in the random component of the model?
So what do the inter-rater agreements look like?  I presume rater is 
actually a nuisance variable?  The path model I would usually use would 
have phase and role acting on the averaged-over-raters a and v 
scores (measurement model bit).

Just 2c, David Duffy.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130416/c2cf1fcf/attachment.pl>
Hi Joseph,

thanks for this detailed summary.  Based on my understanding, I think
that it is defensible to include the raters as random effects in the
model, and I think that doing so provides a more faithful
representation of your experimental design than would excluding them.

Definitely not nuts.

On the niggles: I'm not sure what exactly you mean by "averaging over
the ratings".  It sounds risky to me.

Cheers

Andrew

On Tue, Apr 16, 2013 at 7:50 PM, Joseph Bulbulia
Hi all,

Two of you asked for more information.  Sorry for the long-winded account, written in haste.

THE QUASI EXPERIMENT
* The fire-walking ritual consisted of a series of 26 ordeals by fire.
* Each fire-walker traversed a burning bed of coals (677 Celsius -- I actually measured it with a pyrometer. Such instruments exit!).
* In sixteen of these events, fire-walkers were carrying a passenger.
* Total duration of each fire-walk = < 5 seconds, which we carved up into five phases.

I constructed a make-shift plot plan of the ritual here (following ideas in Walter Stroup's recent book on GLMMs).  Not sure I'm happy with it, but it will give you the gist.

https://www.dropbox.com/s/0o0a9kkrh5ttsd2/plot.plan.emotions_firewalk.pdf
Hypotheses 1:
Anthropologists have long maintained that rituals cause a melding of emotions, what they call "collective effervescence". You've felt this surely.  Being connected with others at a big event ? This "merge" model predicts that arousal and valence will tend to be coupled among participants, irrespective of ritual roles.  (In another paper, we demonstrated heart rhythm coupling among fire walkers and observers; the study was published in 2011.)

Hypothesis 2:
Another anthropological tradition predicts differentiation in emotions depending on ritual role.  This "verge" model predicts that ritual participants who undertake a rite of passage will express different emotions. Think of a PhD thesis defence. The candidate's ordeal is the inquisitor's delight! (In our heart rhythm paper we also found differences in synchrony which were predicted by ritual role and social distance. This study is just a follow up using another biomarker.)

To assess whether emotional dynamics merge or verge, we sampled images for each ritual participant (n=42, 26 firewalkers and 16 passengers) at five different phases of the fire-walk.

There's evidence of cultural variability in emotions, so the images were independently rated by four judges from the part of Spain, roughly from the area where the ritual happened.

(Note: if I could do this over again, I'd get more raters, but this sort of number is typical in psychological research: it is probably OK for the task a hand, which does not require exact estimates, only rough assessments of trends among each ritual group).

See how you do here.  Merge or Verge?
https://www.dropbox.com/s/xmbci5814h73i0l/4_MF_e5.png

Images were rated for ?valence? and ?arousal? on Likert scales from 1-7.  I didn't run an ICC because I wasn't sure whether this is appropriate for ordinal data (If anyone knows I'd be grateful, but I didn't want to bog down the list with too many questions).  Kendall's coefficient of concordance was 0.513.   As is typical in emotions research, then, judgements were not all that concordant.  But again noisy signals are ok in the context of this study.  There's a larger philosophical discussion about whether emotions are intrinsically vague and context-dependent creatures.  We can set that to the side though.  Crude signals, in this case, are fine.

The Model
  Fixed effects for Phase x Role strongly improve the intercept only model, and show merge for arousal and verge for valence. This finding is supported in all other models.
  Random slopes for participants by Phase do better than random intercepts and slopes.
  Random effects for Events improve the model, but there's no improvement by including effects for Dyadic pairs.
  I used an ordinal family because the data are ordinal.
  I fixed the R variance to 1 because this is what Jarrod Hadfield's Course Notes recommend, and he is a man who knows what he's talking about.

Key point
Nothing hangs on putting raters into the model!!   The outcome remains the same with respect to the hypotheses.  I could leave them out (and probably will).  However it seems to me that the raters are somehow part of the effect, in a way that is very roughly analogous to meta-analysis studies. (However I did not attempt MEV? seemed a bit extreme, but who knows?!!)

Other niggles
My psychologist collaborator (experienced with LMMs using HLM and MPLUS)  suggested averaging over the ratings. This is standard practice in psychology. In fact, psychologist do this all the time wherever they have highly correlated measures for the same trait (e.g. personality).
This strikes me as OK for most purposes, but it is also odd, because you loose a signal for the variance of your measures.

Again, sorry for stealing time.  Thanks for any help.

On 16/04/2013, at 8:05 PM, David Duffy <David.Duffy at qimr.edu.au> wrote:

On Tue, 16 Apr 2013, Joseph Bulbulia wrote:

Hi all,

I?d like to model emotional dynamics in a highly arousing firewalk ritual.

Four judges rated images from 42 participants for arousal and valence. The predictor variables are ritual ?phase? and ?role.?

Question 1
Any thoughts about how best to handle the rater assessments?

Specifically, is it nuts to explicitly include a component for raters in the random component of the model?
So what do the inter-rater agreements look like?  I presume rater is actually a nuisance variable?  The path model I would usually use would have phase and role acting on the averaged-over-raters a and v scores (measurement model bit).

Just 2c, David Duffy.

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Andrew Robinson
Deputy Director, ACERA
Senior Lecturer in Applied Statistics                      Tel: +61-3-8344-6410
Department of Mathematics and Statistics            Fax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: a.robinson at ms.unimelb.edu.au    Website: http://www.ms.unimelb.edu.au

FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
SPuR: http://www.ms.unimelb.edu.au/spuRs/
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130419/a90e12f0/attachment.pl>