Hi, I have a model like this:
# for many y values/probes
formula = y ~ concordant + age.proband + age.sibling + sex.proband
+ sex.sibling
I run this model and get p-values with the formula:
model = lm(formula, data=df2)
s = summary(model)
p.cordant = s$coefficients["concordantT", "Pr(>|t|)"]
But, an proband can have multiple siblings, so I want to account for
family structure:
So, I use:
library(lme4a)
# for many y values.
model = lmer(y ~ concordant + age.proband + age.other +
sex.proband + sex.proband + sex.other + (1| family_id.proband),
data=df)
degrees.of.freedom = length(unique(df$family_id.proband)) - 1
p.from.t = function(t){
2*pt(-abs(t),df=degrees.of.freedom)
}
s = summary(model)
concordant.t.score = s$coefficients['concordantT', 't value']
pcordant = p.from.t(concordant.t.score)
Everything else between the 2 runs is the same. For the simple case, I
have unique 80 pairs (since I only use each proband once),
and for the latter, I have 98 pairs. I'm doing this test for millions
of probes and looking for regions of where the concordant
parameter is significant, I find much different regions between the 2
models--very little overlap.
Is this to be expected? Intuitively, I'd figure that using
the random effect via lme4a would just give more power. Are my p-value
calculations correct?
Thanks for any feedback,
-Brent
much different results for random effect vs simple lm.
2 messages · Brent Pedersen, David Duffy
On Mon, 20 Jun 2011, Brent Pedersen wrote:
Hi, I have a model like this: # for many y values/probes formula = y ~ concordant + age.proband + age.sibling + sex.proband + sex.sibling I run this model and get p-values with the formula: model = lm(formula, data=df2) s = summary(model) p.cordant = s$coefficients["concordantT", "Pr(>|t|)"] But, an proband can have multiple siblings, so I want to account for family structure: So, I use: library(lme4a) # for many y values. model = lmer(y ~ concordant + age.proband + age.other + sex.proband + sex.proband + sex.other + (1| family_id.proband), data=df) degrees.of.freedom = length(unique(df$family_id.proband)) - 1 Everything else between the 2 runs is the same. For the simple case, I have unique 80 pairs (since I only use each proband once), and for the latter, I have 98 pairs. I'm doing this test for millions of probes and looking for regions of where the concordant parameter is significant, I find much different regions between the 2 models--very little overlap. Is this to be expected? Intuitively, I'd figure that using the random effect via lme4a would just give more power. Are my p-value calculations correct?
You need to look at just a few probes in detail. Given you have such a small sample size (and how many concordant pairs?), you might expect a bit of shifting about. The other model you should check (in a subset) is your first model fitted to all 98 pairs, using your conservative degrees of freedom from model 2 (this would be pretty similar to a GEE, AIUI). Cheers, David Duffy.
| David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v