Does the “non-independent" data structure defined in mixed models follow the “independency” defined by probability theory?
On point 1, depending on the number of sites yes you can use a random effect instead of a fixed effect to account for omitted variables like the site selection mechanism. If you are doing this to control for site effects that are essentially contamination and of no theoretical interest, then using fixed effects for site is the easiest approach for a linear model. In most generalized linear models you can?t effectively difference the fixed effects out of the data in the same way and including them in the model will result in incidental parameters bias with as few as ten dummy variables. If you are interested in understanding how the site related latent variable might work, then you should use a mixed effects model and be sure to include group averages for your lower level variables so that you can interpret the within group and between group effects separately. You may also need to model random coefficients because decomposing the variables doesn?t always completely orthogonalize the within group versions of the variables and the random effect. With any random effect you are assuming that it is uncorrelated with fixed components in the model which means you are modeling the relationship between the random effect and all of your independent variables regardless of what you do. You can either take the fixed effects/group indicator variables approach or the mixed effects modeling approach but in both cases doing it properly means you have accounted for lack of independence across variables and within sites.
On Tue, Sep 6, 2016 at 3:41 AM, Chen, Chun <chun.chen at wur.nl> wrote:
Thank you Ben for the answer. Now I am wondering: 1) If I happened to have a grouping variable that is not by design, for instance my randomly selected observations turned out to show some site related characteristics, is it sound to apply a mixed model including site as random intercept? In practice, it is pretty common to use site as a fixed effect in the regression analysis (i.e. to detect the main effect after adjusting site effect), even site is not a factor in the experimental/observational design. 2) If site can be used as a random intercept, what is the exact criteria for non-independence (i.e. nested structure ) in the context of applying a mixed model? Not the same as what you defined below? 3) In case site can not be used as random intercept, but can be used as a fixed factor: I assume that if a categorical variable can be modeled as a fixed effect, it can also be modeled as random effect (both are trying to estimate an effect, but using different ways). Additionally, there is no limitation about on what condition we can use a variable as fixed factor during regression (you can apply any variable as an fixed effect if you hypothesie the effect, no non-independence requirements). Why do we need non-independence condition for the random factors? Thanks Regards, Chun -----Original Message----- From: Ben Bolker [mailto:bbolker at gmail.com] Sent: maandag, september 05, 2016 20:51 To: Chen, Chun Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Does the ?non-independent" data structure defined in mixed models follow the ?independency? defined by probability theory? On Mon, Sep 5, 2016 at 4:08 AM, Chen, Chun <chun.chen at wur.nl> wrote:
Dear all, I am bit puzzled by definition of the ?nested data? or ?non-independent
data? structure in the mixed model.
From the statistical point of view, independency is defined as the
probabilities of selecting two observations are not influencing each other. In this case, if I design an experiment where I on purposely select two observations from the same group (or strata), then later on we can say these two observations are dependent. However, if I am doing a sampling with replacement and by coincidence I selected one observations twice (e.g. throw a dice twice and by coincidence we get both a ?6? each time). The probability of selecting these two observations are indeed not influencing each other and they are independent.
My questions are: What?s the definition of the ?non-independent data? that is often referred in mixed modeling? Is it the same concept as ?independency? defined by probability theory, which is relevant by how the observations are selected, rather than how the observations look alike in the final sample
(You say "questions" here, but there really seems to be only one question here.) Yes, mixed modeling defines grouping variables based on experimental/observational design. That is, grouping variables are identifiers that are believed *a priori* to be associated with non-independence of observations with the same identifier values. Ben Bolker
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks, John John Poe Doctoral Candidate Department of Political Science Research Methodologist UK Center for Public Health Services & Systems Research University of Kentucky 111 Washington Avenue, Room 203a Lexington, KY 40536 www.johndavidpoe.com [[alternative HTML version deleted]]