Skip to content

Level 2 outcome and 'Downdated VtV' error

5 messages · Matthew Boden, Thierry Onkelinx, Wolfgang Viechtbauer +1 more

#
Good afternoon,

I am looking for advice regarding a multi-level model I am trying to
implement using lme4. My two-level random-effects model won?t run, perhaps
due to one or two issues.

Background: Level 1 is patients, which are clustered in healthcare
facilities (?Station?). The outcome is a continuous variable (?PopCov?)
that is calculated at the facility-level, and is thus a Level 2 variable
that does not vary at the patient level.

The aim of this analysis is to examine whether PopCov is predicted by (a)
patient-level (e.g., race/ethnicity, age, symptom severity), and (b)
facility-level variables (e.g., overall racial/ethnic composition, average
age). It is important to examine factors such as race/ethnicity at both
patient and facility-levels because patients with different racial/ethnic
backgrounds tend to differ in terms of age, symptom severity, etc.

Each record/row in my data is a patient, with facility-level variables
(including PopCov) having identical values among patients within a given
facility.

An error is thrown when I run a basic model.

A1 <-lmer(PopCov ~ (1 | Station), data = DISP)

*Error in fn9nM$xeval()) : Downdated VtV is not positive definite

I obtain the same error when I add to the model either a patient-level or
facility level predictor.

An internet search suggested that I have complete separation of my data
and/or poorly scaled variables.

I assume this issue has to do with the fact that the outcome is a level 2
variable. Perhaps compounding the issue is the large and unbalanced nature
of the data. I have ~6 million patients clustered in ~1000 healthcare
facilities. Individual facilities have anywhere from 100 to 30000 patients
clustered in them.

I could use some advice regarding how to specify the model to predict a
facility-level variable (level 2) from both patient (level 1) and
facility-level (level 2) variables with these data.

Thank you in advance.

Matt
#
Dear Matthew,

I recommend aggregating the data into one record per healthcare facility,
as you did when calculating the outcome variable. The aggregation removes
all variability at the patient level. Given the huge dataset, that would
force the error term close to zero.

Another option is to use an outcome variable at the patient level.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op di 7 jul. 2020 om 00:19 schreef Matthew Boden <matthew.t.boden at gmail.com

  
  
#
Hi Matt,

What you are trying to do (i.e., use a level 2 variable as the outcome) can and should not be done. The outcome in a multilevel model needs to be measured at the lowest level.

In your model (A1), we know a priori that there is 0 within-station variability. Hence, the ICC is exactly equal to 1 in that model, but trying to fit such a model pushes the optimization routines into a situation that leads to degeneracies.

The only way to get around this is to aggregate the data to the level of the outcome (i.e., use PopCov as the outcome and aggregate all other level 1 predictors to level 2 means).

Best,
Wolfgang
#
Agreed with the others. Chiming in only because I've recently been
doing research on such aggregation and I can say the consensus seems
to be it doesn't introduce bias (with the possible exception of very
small clusters, which you don't have).

On Tue, Jul 7, 2020 at 6:40 AM Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

  
    
1 day later
#
Thank you for these responses. I figured this was the case (that you
shouldn't predict a Level 2 variable in a mixed model), but followed
contrary advice from a colleague.  Appreciate the help.

Matt

On Tue, Jul 7, 2020 at 6:16 AM Patrick (Malone Quantitative) <
malone at malonequantitative.com> wrote: