Is it right to specify a random slope for the dummy variables
This is an interesting question in that it provides an opportunity for philosophical musings on fixed- and random-effects and on the way that categorical covariates are incorporated into model matrices. Allow me to expand a bit (well, maybe more than a bit) on Reinhold's remarks below. On Wed, Dec 31, 2008 at 5:35 AM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
By default, a categorical variable (factor) with n levels comes with n-1 treatment contrasts. Therefore, in the fixed-effect part of the model the intercept represents the reference level and the n-1 contrasts represent the mean differences between the other levels and the reference level, assuming a balanced design. You can check your specification with contrasts(factor). Of course, you should change from treatment to other contrasts as required by your hypotheses. See ?contrasts.
Now suppose you have the variable group as random factor in the model and you include the variable factor also in the random effects part:
lmer(y ~ factor + (factor|group))
Then, you can estimate the variance of the intercept (i.e., variance of reference level for groups), variances of the n-1 difference scores for group, and correlations between intercept and difference scores as random effects (i.e., you estimate varying intercepts and varying differences and the correlations between them).
Thus, with categorical variables you are mostly looking at the variance and correlation of difference scores between levels of a factor rather than variance and correlation of slopes (which are also a kind of difference score, of course).
Reinhold is exactly correct in stating that the coefficients associated with a categorical factor are usually expressed as an intercept and a set of k-1 "contrasts", where k is the number of levels in the factor. There are alternatives, however, and it can be interesting to explore them. For brevity let me write f for a factor whose levels are considered fixed and repeatable, g for a factor whose levels represent a sample from a, possibly infinite, set of potential levels, x for a continuous covariate and y for the response. The number of levels for factor f is k. As most R users are (or should be) aware, a model formula, which is used to generate a model matrix from a data set, includes an implicit intercept term. Thus the formula y ~ x is equivalent to the formula y ~ 1 + x In fact, many people prefer to use the second form because it more clearly illustrates that there will be two coefficients estimated, the first associated with a constant term and the second associated with the numeric values of x. If we wish to suppress the intercept we can replace the '1' by a '0' to obtain the formula y ~ 0 + x When we have only continuous covariates in the model formula, the underlying models represented by 1 + x and 0 + x are different but the parameterization is the same. That statement may seem opaque but bear with me and I will try to explain what I mean. The "underlying model" is the set of all possible fitted values from the model. One of the characteristics of linear models is that the model is defined by the set of all possible predictions, which must be a linear subspace of the response space. For those who know the linear algebra terminology, this subspace is the column span of the model matrix. The coefficients are associated with a particular parameterization but the model itself exists independently of the parameterization. (It is this concept that Don Watts and I ported over to nonlinear regression models to explain some of the effects of nonlinearity.) So what I was saying about 1 + x versus 0 + x is that the column spans of the model matrices are different; the first is two dimensional and the second is one dimensional, but the parameterization for the column spans is the same. This is not the case for a categorical factor. The model formula y ~ 1 + f and the model formula y ~ 0 + f generate the same model. Each generates a model matrix whose column span is the k-dimensional span of the indicator columns for the levels of f. For the second formula the parameterization is derived from the indicator columns. We can think of the parameterization for the first model formula as resulting from the constant column followed by the complete set of indicator columns generating a model matrix with k + 1 columns and rank k. To obtain a full-rank matrix we must drop some linear combination of the columns. The default method is to drop the indicator of the first level of the factor but any one will do. That is, we have a well-defined column span but with an arbitrary parameterization. There is an interesting point here regarding the difference between fixed-effects and random-effects terms. The coefficients in fixed-effects terms are estimated by least squares, which I think of as a rigid criterion. The model matrix for the fixed-effects terms must have full column rank. If it doesn't then the coefficient estimates are undetermined. Thus, to obtain a well-defined and unique set of coefficient estimates we must check the rank of the model matrix and adjust the form of the matrix (and hence the parameterization of the model) if we detect a rank-deficient matrix. Much of the checking and adjustment can be and is done at the symbolic level, which is why the code for model.matrix is much more subtle than most would suspect (and also why just about any formula about analysis of variance given in an introductory book is not really the way that analysis of variance results are calculated). The model formula -> model frame -> model matrix path in the S language is almost never recognized for the great accomplishment that it is. However, even the symbolic analysis doesn't account for all cases, which is why there is a secondary check on the numerical rank of the model matrix - something that is not nearly as simple as it sounds. As anyone who has taken a linear algebra class knows, the rank of a matrix is a well-defined property that is easily evaluated. As anyone who has taken a numerical linear algebra class knows, the rank of a matrix is essentially undefined and impossible to evaluate reliably. One side note; it is exactly the need to assign a rank to a model matrix and to take corrective action for rank-deficient cases that keeps us using Linpack code for the QR decomposition instead of the numerically superior and more efficient Lapack code. Anyway, back at the fixed-effects versus random-effects computational issues. The coefficients for a random-effects term are "estimated" by penalized least squares, which I think of as a flexible criterion. (I would embark on analogies about least squares being the oak and penalized least squares being the willow in the wind storm but that is probably a little too much.) The penalty term takes care of any problems with rank deficiencies. We say that it "regularizes" the calculation in the sense that that the conditional means of the random effects are "shrunken" toward zero relative to the least squares estimates. It is exactly this regularization that allows the models to be expressed in the natural parameterization. That is, although the model matrix for the formula y ~ 1 + f cannot be expressed as a constant and the complete set of indicator columns because it will be rank deficient, the model matrix for the formula y ~ 1 + (1 | g) is expressed as a constant and the complete set of indicator columns for the levels of g. Furthermore all of these coefficients are estimable. Finally I get to the point of Zhijie's question about incorporating coefficients for a factor f in the random effects associated with the levels of g. Reinhold explained what the formula that could be written as 1) y ~ 1 + f + (1 + f | g) would generate. This model is equivalent to 2) y ~ 1 + f + (0 + f | g) or 3) y ~ 0 + f + (0 + f | g) and I would recommend one of these forms for interpretability. The parameterization of the fixed-effects is unimportant so either 2) or 3) will do. The parameterization of the random effects is similarly unimportant in that formulas 1), 2) and 3) all generate the same model fits but I feel there is an advantage in using the indicator column representation from 2) and 3). All three of these formulas will result in k variances and k(k-1)/2 covariances (or correlations) being estimated and up to k random effects for each level of g. In 2) and 3) the variances reported are the variance of the effect of level i of factor f on level j on factor g (or vice versa). The disadvantage of all three formulations is that when k is moderate the number of variance/covariance parameters being estimated, k(k+1)/2, can be large. An alternative model, which I prefer as a starting point, is 4) y ~ 0 + f + (1 | g) + (1 | f:g) Model 4) allows for an additive shift for each level of of g and an additive shift for each (observed) combination of levels of f and g. There are actually more random effects in 4) than in 1), 2) or 3) but many fewer variance/covariance parameters. Model 4) has only two variance parameters to be estimated and no covariances. Model 4) corresponds to model 3) subject to the constraint that the variance-covariance matrix for the random effects have a compound symmetry pattern. Some of these issues are illustrated in the section "Interactions of grouping factors and other covariates" in the slides at http://www.stat.wisc.edu/~bates/UseR2008/WorkshopD.pdf (slides 84 to 95).
On Wed, Dec 31, 2008 at 9:45 AM, zhijie zhang <epistat at gmail.com> wrote:
Dear all, Today, i was thinking the following question. We know the variables may be classified into continuous, ordinal, and categorical variables. I was confused about how to handle with the categorical variables in the multi-level models. For fixed effects, the categorical variables were always treated as dummy variables, my questions are: 1. Could the random slope be specified for categorical variables that was always changed into the form of dummy variables? 2. If the random slope could be specified for categorical variables, how to explain it? It seems a little different from the continuous variables. I tried the GLIMMIX Procedure in SAS. It seems that SAS treats categorical variables as continuous variables. While in MLWin, it seems that random slope could be specified for the dummy variables . Any ideas on it are greatly appreciated.