Dear all, In macro-social science, it's become fairly conventional to analyse repeated cross-sectional survey data using three-level models. Individual survey espondents (level-1) are nested in state-years (level-2), which are in turn nested within states (level-3). One big pay-off is the ability to examine how time-constant or time-varying state-level variables affect level-1 outcomes. A co-author and I recently had a reviewer question whether this approach is adequate, however. He/she suggested that this approach could generate very misleading results, if the data are nonstationary. (We just included a linear time effect in our models.) So I'm thinking about how to proceed (and I'm not particularly knowledgeable about time series analysis). Any advice would be much appreciated. We used lme4 to fit the models in our paper, and we have several tens of thousands of respondents nested in 48 states, each observed about 15 or 16 times over about a 30-year period. (1) Is the reviewer's query? Is he/she right to question this approach? (2) How might we test for nonstationarity? The reviewer mentioned differencing the outcome variable, but in a multilevel context I'm not sure how to do that... Perhaps we could calculate an *aggregate* value for every state-year, and check the aggregated data for autocorrelation? My understanding is that autocorrelation across multiple lags is a strong indicator of nonstationarity (while, conversely, the absence of multiple-lag autocorrelation is almost a guarantee of stationarity). I believe this can be done with nlme, as a two-level model, with state-years nested within states. (3) However, that approach would seem to throw away a lot of level-1 information (about individual respondents), and I'm not sure about the implications for any significance tests. An alternative approach would seem to be "multilevel time series", where autocorrelation at the *group* rather than individual/first level is specifically allowed for in the model. However, I can't find any references to R packages (or other software) that allow for the specification of, for example, AR1 processes at anything other than level-1 in multilevel models. In short, I'd be curious to hear what people think... (especially if anyone out there happens to be a whiz at both multilevel and time series analysis). I hope I've been clear about the problem, but I'm happy to elaborate. Thanks in advance for any help. Cheers, Malcolm Dr Malcolm Fairbrother Lecturer School of Geographical Sciences University of Bristol
multilevel time series?
3 messages · Malcolm Fairbrother, ONKELINX, Thierry, Douglas Bates
Dear Malcolm, Your design requires IMHO crossed random effects instead of nested random effects. Individual is clearly crossed with year. Each individual can be surveyed in more that one year and vice versa. If they were nested, all data from a specific individual would come from only one specific year. The same goes for state and year, they are rather crossed than nested. Fitting year as a crossed random effect will take nonstationarity along time into account. The size of variance of this random effect will indicate how strong this nonstationarity is. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
-----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Malcolm Fairbrother Verzonden: zondag 26 september 2010 21:18 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] multilevel time series? Dear all, In macro-social science, it's become fairly conventional to analyse repeated cross-sectional survey data using three-level models. Individual survey espondents (level-1) are nested in state-years (level-2), which are in turn nested within states (level-3). One big pay-off is the ability to examine how time-constant or time-varying state-level variables affect level-1 outcomes. A co-author and I recently had a reviewer question whether this approach is adequate, however. He/she suggested that this approach could generate very misleading results, if the data are nonstationary. (We just included a linear time effect in our models.) So I'm thinking about how to proceed (and I'm not particularly knowledgeable about time series analysis). Any advice would be much appreciated. We used lme4 to fit the models in our paper, and we have several tens of thousands of respondents nested in 48 states, each observed about 15 or 16 times over about a 30-year period. (1) Is the reviewer's query? Is he/she right to question this approach? (2) How might we test for nonstationarity? The reviewer mentioned differencing the outcome variable, but in a multilevel context I'm not sure how to do that... Perhaps we could calculate an *aggregate* value for every state-year, and check the aggregated data for autocorrelation? My understanding is that autocorrelation across multiple lags is a strong indicator of nonstationarity (while, conversely, the absence of multiple-lag autocorrelation is almost a guarantee of stationarity). I believe this can be done with nlme, as a two-level model, with state-years nested within states. (3) However, that approach would seem to throw away a lot of level-1 information (about individual respondents), and I'm not sure about the implications for any significance tests. An alternative approach would seem to be "multilevel time series", where autocorrelation at the *group* rather than individual/first level is specifically allowed for in the model. However, I can't find any references to R packages (or other software) that allow for the specification of, for example, AR1 processes at anything other than level-1 in multilevel models. In short, I'd be curious to hear what people think... (especially if anyone out there happens to be a whiz at both multilevel and time series analysis). I hope I've been clear about the problem, but I'm happy to elaborate. Thanks in advance for any help. Cheers, Malcolm Dr Malcolm Fairbrother Lecturer School of Geographical Sciences University of Bristol
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
3 days later
On Mon, Sep 27, 2010 at 3:34 AM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
Dear Malcolm,
Your design requires IMHO crossed random effects instead of nested random effects. Individual is clearly crossed with year. Each individual can be surveyed in more that one year and vice versa. If they were nested, all data from a specific individual would come from only one specific year. The same goes for state and year, they are rather crossed than nested.
Malcolm's original description mentions modeling a linear trend in time, which would make sense to me. Even taking into account the fact that a person can move from one state to another (hence you don't have strict nesting of the person and state factors) such data can still be analyzed using lme4. Before doing so I would want to plot response versus time for several individuals, just to see if a linear trend looks adequate. Having 15 to 20 different time points per subject would allow you to model more than a linear trend within subject. Sometimes people will approach such a case using time series methods, even though the series are rather short. Simple relationships like an AR1 (first-order autoregressive) model generate marginal covariance patterns that are very similar to that generated by a model with per-subject random effects for the intercept and the slope with respect to time. This is why I don't usually combine these terms. It is hard to separate out the effect of each. Your suggestion is somewhat different. It is more like a panel data type of model and could definitely be appropriate if the effect of a particular year was more-or-less common across subjects. This type of model is applied to data like the quarterly profits of several companies. Macro-economic forces can (and did) have industry-wide effects on the Q1 results in 2009 so it makes sense to regard each time period as distinct. If, on the other hand, you had time trends within individuals but not synchronized across time periods then I would set up a model for the within-subject time trends and try to incorporate random effects in that model, as Malcolm seems to indicate they have done.
Fitting year as a crossed random effect will take nonstationarity along time into account. The size of variance of this random effect will indicate how strong this nonstationarity is.
-----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Malcolm Fairbrother Verzonden: zondag 26 september 2010 21:18 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] multilevel time series? Dear all, In macro-social science, it's become fairly conventional to analyse repeated cross-sectional survey data using three-level models. Individual survey espondents (level-1) are nested in state-years (level-2), which are in turn nested within states (level-3). One big pay-off is the ability to examine how time-constant or time-varying state-level variables affect level-1 outcomes. A co-author and I recently had a reviewer question whether this approach is adequate, however. He/she suggested that this approach could generate very misleading results, if the data are nonstationary. (We just included a linear time effect in our models.) So I'm thinking about how to proceed (and I'm not particularly knowledgeable about time series analysis). Any advice would be much appreciated. We used lme4 to fit the models in our paper, and we have several tens of thousands of respondents nested in 48 states, each observed about 15 or 16 times over about a 30-year period. (1) Is the reviewer's query? Is he/she right to question this approach? (2) How might we test for nonstationarity? The reviewer mentioned differencing the outcome variable, but in a multilevel context I'm not sure how to do that... Perhaps we could calculate an *aggregate* value for every state-year, and check the aggregated data for autocorrelation? My understanding is that autocorrelation across multiple lags is a strong indicator of nonstationarity (while, conversely, the absence of multiple-lag autocorrelation is almost a guarantee of stationarity). I believe this can be done with nlme, as a two-level model, with state-years nested within states. (3) However, that approach would seem to throw away a lot of level-1 information (about individual respondents), and I'm not sure about the implications for any significance tests. An alternative approach would seem to be "multilevel time series", where autocorrelation at the *group* rather than individual/first level is specifically allowed for in the model. However, I can't find any references to R packages (or other software) that allow for the specification of, for example, AR1 processes at anything other than level-1 in multilevel models. In short, I'd be curious to hear what people think... (especially if anyone out there happens to be a whiz at both multilevel and time series analysis). I hope I've been clear about the problem, but I'm happy to elaborate. Thanks in advance for any help. Cheers, Malcolm Dr Malcolm Fairbrother Lecturer School of Geographical Sciences University of Bristol
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in ?this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models