Skip to content

Does corSymm() require balanced data?

7 messages · Tip But, Thierry Onkelinx, Ben Bolker +1 more

#
Dear Members,

In my longitudinal data below, the first couple of subjects were measured 4
times but the rest of the subjects were measured 3 times (see data below).

We intend to use an unstructured residual correlation matrix in
`nlme::lme()`. But our model fails to converge.

Question: Given our data is unbalanced with respect to our grouping
variable (i.e., `id`), can we use ` corSymm()`? And if we do, what would be
the dimensions of the resultant unstructured residual correlation matrix
for our data; a 3x3 or a 4x4 matrix?

Thank you for your expertise,
Joe

# Data and R Code
dat <- read.csv("https://raw.githubusercontent.com/hkil/m/master/un.csv")

library(nlme)

fit <- lme(opp~time*ccog, random = ~1|id, correlation=corSymm(form = ~ 1 |
id),
           data=dat)

Error:
  nlminb problem, convergence error code = 1
  message = false convergence (8)
#
Dear Joe,

You have too few subjects with 4 observations. Either drop those fourth
observations. Or use a different correlation structure. E.g. an AR1

fit <- lme(
  opp ~ time * ccog, random = ~1 | id,
  correlation = corSymm(), data = dat, subset = time < 3
)

fit_alt <- lme(
  opp ~ time * ccog, random = ~1 | id,
  correlation = corAR1(form = ~ time), data = dat
)
Best regards,


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op ma 15 mrt. 2021 om 03:27 schreef Tip But <fswfswt at gmail.com>:

  
  
#
Dear Thierry,

Thank you so much for your insightful comments. May I follow-up on them
below in-line:


***"You have too few subjects with 4 observations. Either drop those fourth
observations."
matrix, the unique number of measurements (e.g., 3 times, 4 times etc.)
must have relatively equal sizes (e.g., 9 subjects with 3 times, 7 subjects
with 4 times)?

***"Or use a different correlation structure. E.g. an AR1:

fit_alt <- lme(opp ~ time * ccog, random = ~1 | id,
  correlation = corAR1(form = ~ time), data = dat)
"
It seems `corAR1(form = ~1 | id)` gives the same result?

On Mon, Mar 15, 2021 at 2:37 AM Thierry Onkelinx <thierry.onkelinx at inbo.be>
wrote:

  
  
#
Dear Joe,

CorSymm() needs n * (n - 1) / 2 parameters with n the number of groups
(subjects). n = 4 implies 6 parameters for the correlation alone. So you'll
need plenty of data to fit such a model. I'd recommend that the data should
contain at least 20 subjects for every combination of time points. So 20
subjects with measurements for time 0 and time 1, ... In a balanced case
you'll need at least 20 subjects measured at every time point. If some
combinations are missing in subjects, you'll need extra subjects with those
combinations. 9 and 7 subjects in your data is simply not enough for such a
complex correlation structure.

corAR1(form = ~time) is equivalent to corAR1(form = ~time | id) if random =
~1|id.
I think that corAR1(form = ~1| id) will use the order of the data. So if,
and only if, your data is ordered along time, then it is
equivalent to corAR1(form = ~time | id). I recommend to use the
verbose corAR1(form = ~time | id), which is more clear about the structure
what you want.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op ma 15 mrt. 2021 om 15:56 schreef Tip But <fswfswt at gmail.com>:

  
  
#
On 3/15/21 10:56 AM, Tip But wrote:
Balance is probably less important than the total number with 4 
observations.  If you had 100 subjects with 3 times and 20 subjects with 
4 times you'd probably be fine.
I believe that form = ~1|id uses the order of the observations in the 
data set as the time index, and the grouping variable from the random 
effect as the grouping variable, so these should indeed be equivalent (I 
think the documentation should state this, but I haven't checked)

   If you **really** want an answer you can tell R to return it anyway: 
use control=lmeControl(returnObject=TRUE), but I wouldn't trust it.

   It's hard to find another mixed-model package in R that can handle 
this case (unstructured correlation, homogeneous variance).
#
Dear Joe,

At the risk of revealing something that could be misused (because I agree with Thierry that you are pushing things by trying to fit this model with these data), you can get the model to converge by switching to a different optimizer (i.e., BFGS):

fit <- lme(opp ~ time * ccog, random = ~ 1 | id, correlation = corSymm(form = ~ 1 | id), data = dat, control = list(opt = "optim"))

Whether this converges to the global maximum I have not attempted to check.

Maybe this is still useful to know because it might allow you to make a more informed decision about the use of a simpler model. For example:

fit2 <- lme(opp ~ time*ccog, random = ~ 1 | id, correlation = corAR1(form = ~ time), data = dat, control=list(opt="optim"))
anova(fit, fit2)

shows that the corSymm() model does not fit significantly better than the AR1 model.

Best,
Wolfgang
#
Dear Thierry, Ben and Wolfgang,

Thank you all so much! My sense of `corSymm()` is that there needs to be
"enough" observations with respect to each unique time point index so that
the model can converge and/or be trusted.

However, we can't exactly say what constitutes "enough" observations (maybe
power analysis helps) other than by some kind of trial and error.

The other thing that I noticed was that for `corSymm()` we can NOT use
`corSymm(form = ~ time | id)`. Rather, it must be `corSymm(form = ~1| id)`
even though our goal is to allow observations under all unique `time` point
indices to be correlated with each other.

Is there a specific reason `corSymm(form = ~ time | id)` can not be used
(while `corAR1(form = ~ time | id)` works fine)?

Thanks a million,
Joe

On Mon, Mar 15, 2021 at 12:04 PM Thierry Onkelinx <thierry.onkelinx at inbo.be>
wrote: