What to do with zero inflated, negative skewed, negative data: a question about GLMMs
? I think Gabriella may have abandoned the linear mixed model (i.e. Gaussian distribution) because of a skewed distribution of responses.? A couple of things to keep in mind about this: ??? - you don't need to worry about the *marginal* distribution of the data (i.e., what you get if you plot the histogram or density of your response variable). The assumptions in LMMs (like most models) are about the *conditional* distribution, i.e. the distribution of the residuals (e.g., fit your model first, then examine lattice::qqmath(fitted_model) or hist(residuals(fitted_model)) ??? - non-normality (including skewness) even in the conditional model is much less important to the validity (accuracy of the parameter estimates, confidence intervals, etc.) than many people think ?? - in principle you could transform the response variable to deal with this, although admittedly the choice of transformations is much more limited for non-positive data (e.g. Yeo-Johnson transformations, see `?car::yjPower`, although there are some issues here about whether you're transforming the marginal or the conditional distribution ... ? cheers ??? Ben Bolker
On 11/30/20 2:50 AM, Thierry Onkelinx via R-sig-mixed-models wrote:
Dear Gabriella, I'd try to fit a single model to the data.The response seems continuous to me. So I'd try a Gaussian distribution. You might need to fit a different variance for each of the questions. library(nlme) lme(sentiment ~ question + age + (1|patient)) lme(sentiment ~ question + age + (1|patient), weight = VarIdent(form = ~ 1|question)) Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op ma 30 nov. 2020 om 01:24 schreef Gabriella Kountourides < gabriella.kountourides at sjc.ox.ac.uk>:
Hello everyone, This is my first question to this list :) I hope this email finds you all well. I have been struggling for the past few weeks to set an appropriate model for my data. I have read Prof Bolker's practical guide for ecology and evolution paper, as well as the GLMM FAQs which have been immensely helpful. I am only just beginning my stats journey (and R!) and although I am really enjoying it, I have found myself completely stumped with my dataset. I will describe the data set below, and below that the various attempts I have made to analyse it. I would be incredibly grateful to hear your thoughts. All the very best Data: I want to look whether there is a relationship between the phrasing used when a question is asked (positive, negative, neutral wording) and the polarity of the response from the individual. 2638 people were asked a question about medical symptoms. 1/3 of the people were asked it with a negative wording, 1/3 with a neutral one, 1/3 with a positive one. The big question is: does the way the question is asked affect the polarity of the response From this, I did sentiment analysis (using trincker's< https://github.com/trinker/sentimentr> package), which provides a polarity score (this can be negative, neutral or positive) to see whether their responses were more positive or negative, depending on the wording of the question. Sentiment analysis breaks down responses into sentences, so I have 2638 people, but 7924 sentences, so I would assume to fit ID as a random effect. Range: -4.0376 to + 0.7915. Median :-0.1830 Mean :-0.2149 Mode: 0 skew: -1.7 There are many 0s in my model, these are true 0s, they represent a 'neutral' response, which is important. My data is negatively skewed, so more people answer in a negative way. But I still want to know, whether the phrasings affect the skew/is one phrasing leading to 'less negative' responses? What I've tried: Initially, I tried to do a glm with the raw data, but I can't use poisson as it is negative, it is skewed so its not gaussian, and its not binomial. So next I made 3 new variables, which were counts. For example 'PosCount' scored 1 for each row with a +polarity score, and a 0 if not. Idem for neutral (sentiment=0) and positive (sentiment>0). Decided to run Zero Inflated Poisson I ran a glmm for each count variable-example for the positive one: pos <-glmmTMB(PosCount~ wordingQ + (1|id) + age, data=allprimesent, ziformula=~1, family=poisson) and then the 'overdisp_fun' function which gave
overdisp_fun(posmodel)
chisq ratio rdf p 6268.8427185 0.8295412. 7557.0000000 1.0000000 So I suppose my questions are: do you think this is the best thing to do with my data? Do you know of any better thing I can do with the raw data, I'd rather not lose the information about the strength of the sentiment, but if I keep it, I need a model that can deal with 0 inflation, negative skew, and negative numbers. Many thanks if you've read this! I look forward to hearing from you! All the best p.s. I am relatively new to stats and R, please bare that in mind with your terminology if you are kind enough to answer Gabriella Kountourides DPhil Student | Department of Anthropology Evolutionary Medicine and Public Health Group St. John?s College, University of Oxford gabriella.kountourides at sjc.ox.ac.uk Tweet me: https://twitter.com/GKountourides
________________________________
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models