What to do with zero inflated, negative skewed, negative data: a question about GLMMs

Mon, Nov 30, 2020 9:09 AM

? I think Gabriella may have abandoned the linear mixed model (i.e. 
Gaussian distribution) because of a skewed distribution of responses.? A 
couple of things to keep in mind about this:

 ??? - you don't need to worry about the *marginal* distribution of the 
data (i.e., what you get if you plot the histogram or density of your 
response variable). The assumptions in LMMs (like most models) are about 
the *conditional* distribution, i.e. the distribution of the residuals 
(e.g., fit your model first, then examine lattice::qqmath(fitted_model) 
or hist(residuals(fitted_model))

 ??? - non-normality (including skewness) even in the conditional model 
is much less important to the validity (accuracy of the parameter 
estimates, confidence intervals, etc.) than many people think

 ?? - in principle you could transform the response variable to deal 
with this, although admittedly the choice of transformations is much 
more limited for non-positive data (e.g. Yeo-Johnson transformations, 
see `?car::yjPower`, although there are some issues here about whether 
you're transforming the marginal or the conditional distribution ...


 ? cheers

 ??? Ben Bolker

On 11/30/20 2:50 AM, Thierry Onkelinx via R-sig-mixed-models wrote:

Dear Gabriella,

I'd try to fit a single model to the data.The response seems continuous to
me. So I'd try a Gaussian distribution. You might need to fit a different
variance for each of the questions.

library(nlme)
lme(sentiment ~ question + age + (1|patient))
lme(sentiment ~ question + age + (1|patient), weight = VarIdent(form = ~
1|question))

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op ma 30 nov. 2020 om 01:24 schreef Gabriella Kountourides <
gabriella.kountourides at sjc.ox.ac.uk>:

Hello everyone,

This is my first question to this list :) I  hope this email finds you all
well.


   I have been struggling for the past few weeks to set an appropriate
model for my data. I have read Prof Bolker's practical guide for ecology
and evolution paper, as well as the GLMM FAQs which have been immensely
helpful. I am only just beginning my stats journey (and R!) and although I
am really enjoying it, I have found myself completely stumped with my
dataset. I will describe the data set below, and below that the various
attempts I have made to analyse it. I would be incredibly grateful to hear
your thoughts.

All the very best

Data:


I want to look whether there is a relationship between the phrasing used
when a question is asked (positive, negative, neutral wording) and the
polarity of the response from the individual.


2638 people were asked a question about medical symptoms.

1/3 of the people were asked it with a negative wording, 1/3 with a
neutral one, 1/3 with a positive one.

The big question is: does the way the question is asked  affect the
polarity of the response


 From this, I did sentiment analysis (using trincker's<
https://github.com/trinker/sentimentr> package), which provides a
polarity score (this can be negative, neutral or positive) to see whether
their responses were more positive or negative, depending on the wording of
the question.


Sentiment analysis breaks down responses into sentences, so I have 2638
people, but 7924 sentences, so I would assume to fit ID as a random effect.


Range: -4.0376 to + 0.7915.
Median :-0.1830
Mean   :-0.2149

Mode: 0
skew: -1.7

There are many 0s in my model, these are true 0s, they represent a
'neutral' response, which is important. My data is negatively skewed, so
more people answer in a negative way. But I still want to know, whether the
phrasings affect the skew/is one phrasing leading to 'less negative'
responses?

What I've tried:
Initially, I tried to do a glm with the raw data, but I can't use poisson
as it is negative, it is skewed so its not gaussian, and its not binomial.

So next I made 3 new variables, which were counts. For example 'PosCount'
scored 1 for each row with a +polarity score, and a 0 if not.  Idem for
neutral (sentiment=0) and positive (sentiment>0). Decided to run Zero
Inflated Poisson

I ran a glmm for each count variable-example for the positive one:
pos <-glmmTMB(PosCount~ wordingQ + (1|id) + age, data=allprimesent,
ziformula=~1, family=poisson)

and then the 'overdisp_fun' function which gave

overdisp_fun(posmodel)

  chisq                  ratio                          rdf            p
6268.8427185    0.8295412.   7557.0000000    1.0000000

So I suppose my questions are: do you think this is the best thing to do
with my data? Do you know of any better thing I can do with the raw data,
I'd rather not lose the information about the strength of the sentiment,
but if I keep it, I need a model that can deal with 0 inflation, negative
skew, and negative numbers.

Many thanks if you've read this! I look forward to hearing from you!
All the best

p.s. I am relatively new to stats and R, please bare that in mind with
your terminology if you are kind enough to answer


Gabriella Kountourides

DPhil Student | Department of Anthropology

Evolutionary Medicine and Public Health Group

St. John?s College, University of Oxford

gabriella.kountourides at sjc.ox.ac.uk

Tweet me: https://twitter.com/GKountourides

________________________________



         [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

What to do with zero inflated, negative skewed, negative data: a question about GLMMs

Thread (4 messages)