Skip to content

correlation of two tests treating items of both tests as random effects?

3 messages · Jonathan Baron, Jake Westfall, David Duffy

#
I thought I had a solution to this problem, but I don't. The problem
is very simple to state. It is to find whether one test correlates
with another, when each test has several items sampled from a larger
population of potential items.

I give two psychological tests to a group of subjects. Each test can
be seen as a sample of items from a population. Vocabulary tests and
arithmetic problems are examples.[1] Usually researchers just get a
total score on each test and look at the Pearson correlation. And
usually this is fine because the correlation is high enough that its
existence is not in doubt, and the magnitude of the correlation is of
primary interest.

But sometimes some theoretical question hinges on whether the tests
correlate at all. They could correlate spuriously because of the
particular sample of items used in each test. So one way to handle
this is to think of items as random effects.

It is easy to do this with lmer() when ONE of the two tests is treated
as a random effect. Each observation is the subject's score on one
item of that test (test 1), and the summary score of the other test
(test 2) is the predictor. The model has crossed random effects for
subjects and test 1 items. The number of rows in the data frame is
(number of subjects) times (number of items in test 1).

I thought it might be possible to extend this idea by making each row
consist of a subject's score on one item of test 1 and her score on
one item of test 2. The total number of rows would be (number of
subjects) times (number of items in test 1) times (number of items in
test 2). And I would include crossed random effects for subjects, test
1 items, and test 2 items. But then what? Do I just predict one test
from the other, as before? (The direction may matter, but that is the
least of my worries.)

I'm stuck. And this may be a blind alley.

Jon

Note:

[1] Not all psychological tests are like this. Some are designed to
represent a balance between different items so that only the test as a
whole, not each item, measures the trait of interest correctly.
#
Hi Jon,

It's an interesting problem. I just put up a github gist where I write a function to simulate data with the structure you describe, then fit what I think are the appropriate models (estimating OR ignoring random item variance) with lmer() and do the model comparisons:

https://gist.github.com/jake-westfall/3b9b4aee0c980a279acb

I ran the simulation 1000x with the true correlation at 0.5, and 1000x with the true correlation at 0. The true values are recovered quite well so it makes me think the approach is reasonable. However, at least for the parameter values I tried, adding random item variance made essentially no difference to the estimates/tests of the subject-level correlation in test scores. There's a little difference but it's hardly worth mentioning.

Basically I set up the data frame so that, if there are m items on each test, then each subject has 2*m rows in the data frame, m for each test. Then the model consists of two dummy variables indicating the test (no intercept/constant term), and these dummies vary randomly across subjects and items. You'll see in the model syntax that it's a bit hackish but it seems to work. Two other things to note about the approach I used: (a) the items from both tests are counted as a single random factor, although their variances are allowed to be different for each test; (b) the residual variance is constrained to be equal for observations from both tests, which is just an lmer() thing. In my sim I set parameter values as if the two tests are two different IQ tests, so it's fine. But this might be problematic for your actual data if the two tests are really different. You may need to scale items/observations before fitting the model or something.

Happy to hear comments from anyone else who read this far.

Jake

  
  
#
On Fri, 6 Nov 2015, Jonathan Baron wrote:

            
Isn't this just the usual question about how to run a multivariate mixed 
model in lme4?  So you stack both tests with an appropriate indicator 
variable, and read off the correlations from the RE variances/covariances?

| David Duffy (MBBS PhD)
| email: David.Duffy at qimrberghofer.edu.au  ph: INT+61+7+3362-0217 fax: -0101
| Genetic Epidemiology, QIMR Berghofer Institute of Medical Research
| 300 Herston Rd, Brisbane, Queensland 4006, Australia  GPG 4D0B994A