Hi
I have data from 12 subjects. The measurement is log(expression) of a
particular gene and can be assumed to be normally distributed. The 12
subjects are divided into the following groups:
Infected, Vaccinated, Lesions - 3 measurements
Infected, Vaccintaed, No Lesions - 2 measurements
Infected, Not Vaccinated, Lesions - 4 measurements
Uninfected, Not Vaccinated, No Lesions - 3 measurements
Although presence/absence of lesions could be considered to be a
phenotype, here I would like to use it as a factor. This explains some
of the imbalance in the design (ie we could not control how many
subjects, if any, in each group would get lesions).
First impressions - the data looks like we would expect. Gene
expression is lowest in the infected/not vaccinated group, then next
lowest is the infected/vaccinated group and finally comes the
uninfected/not vaccinated group. So the working hypothesis is that gene
expression of the gene in question is lowered by infection, but that the
vaccine somehow alleviates this effect, but not as much as to the level
of a totally uninfected subject. We *might* have access to data
relating to uninfected/vaccinated group, my pet scientist is digging for
this as we speak.
As for lesions, well none of the uninfected subjects have them, all of
the infected/not vaccinated subjects have them, and some of the
infected/vaccinated have them, some don't. Again, this makes for a very
sensible hypothesis if we treat presence/absence of lesions as a
phenotype, but in addition to that I want to know if gene expression is
linked to presence/absence of lesion, but only one group of subjects has
both lesions and non-lesions within it. Eye-balling this group,
presence/absence of lesions and gene expression are not linked.
So I have this as a data.frame in R, and I wanted to run an analysis of
variance. I did:
aov <- aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
summary(aov)
And got:
Df Sum Sq Mean Sq F value Pr(>F)
Infected 1 29.8482 29.8482 66.7037 3.761e-05 ***
Vaccinated 1 13.5078 13.5078 30.1868 0.0005777 ***
Lesions 1 0.0393 0.0393 0.0878 0.7746009
Residuals 8 3.5798 0.4475
---
This tells me that Infected and Vaccinated are highly significant,
whereas lesions are not.
So, what I want to know is:
1) Given my unbalanced experimental design, is it valid to use aov?
2) Have I used aov() correctly? If so, how do I get access results for
interactions?
3) Is there some other, more relevant way of analysing this? What I am
really interested in is the gene expression, and whether it can be shown
to be statistically related to one or more of the factors involved
(Infected, Vaccinated, Lesions) or interactions between those factors.
Many thanks in advance
Mick
Help with three-way anova
3 messages · michael watson (IAH-C), Federico Calboli, John Fox
On Tue, 2005-04-05 at 15:51 +0100, michael watson (IAH-C) wrote:
So, what I want to know is: 1) Given my unbalanced experimental design, is it valid to use aov?
I'd say no. Use lm() instead, save your analysis in an object and then possibly use drop1() to check the analysis
2) Have I used aov() correctly? If so, how do I get access results for interactions?
The use of aov() per se seems fine, but you did not put any interaction in the model... for that use factor * factor. HTH, F
Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com
Dear Michael, For unbalanced data, you might want to take a look at the Anova() function in the car package. As well, it probably makes sense to read something about how linear models are expressed in R. ?lm and ?formula both have some information about model formulas; the Introduction to R manual that comes with R has a chapter on statistical models; and books on R typically take up the subject at greater length. I hope this helps, John On Tue, 5 Apr 2005 15:51:46 +0100
"michael watson \(IAH-C\)" <michael.watson at bbsrc.ac.uk> wrote:
Hi
I have data from 12 subjects. The measurement is log(expression) of
a
particular gene and can be assumed to be normally distributed. The
12
subjects are divided into the following groups:
Infected, Vaccinated, Lesions - 3 measurements
Infected, Vaccintaed, No Lesions - 2 measurements
Infected, Not Vaccinated, Lesions - 4 measurements
Uninfected, Not Vaccinated, No Lesions - 3 measurements
Although presence/absence of lesions could be considered to be a
phenotype, here I would like to use it as a factor. This explains
some
of the imbalance in the design (ie we could not control how many
subjects, if any, in each group would get lesions).
First impressions - the data looks like we would expect. Gene
expression is lowest in the infected/not vaccinated group, then next
lowest is the infected/vaccinated group and finally comes the
uninfected/not vaccinated group. So the working hypothesis is that
gene
expression of the gene in question is lowered by infection, but that
the
vaccine somehow alleviates this effect, but not as much as to the
level
of a totally uninfected subject. We *might* have access to data
relating to uninfected/vaccinated group, my pet scientist is digging
for
this as we speak.
As for lesions, well none of the uninfected subjects have them, all
of
the infected/not vaccinated subjects have them, and some of the
infected/vaccinated have them, some don't. Again, this makes for a
very
sensible hypothesis if we treat presence/absence of lesions as a
phenotype, but in addition to that I want to know if gene expression
is
linked to presence/absence of lesion, but only one group of subjects
has
both lesions and non-lesions within it. Eye-balling this group,
presence/absence of lesions and gene expression are not linked.
So I have this as a data.frame in R, and I wanted to run an analysis
of
variance. I did:
aov <- aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
summary(aov)
And got:
Df Sum Sq Mean Sq F value Pr(>F)
Infected 1 29.8482 29.8482 66.7037 3.761e-05 ***
Vaccinated 1 13.5078 13.5078 30.1868 0.0005777 ***
Lesions 1 0.0393 0.0393 0.0878 0.7746009
Residuals 8 3.5798 0.4475
---
This tells me that Infected and Vaccinated are highly significant,
whereas lesions are not.
So, what I want to know is:
1) Given my unbalanced experimental design, is it valid to use aov?
2) Have I used aov() correctly? If so, how do I get access results
for
interactions?
3) Is there some other, more relevant way of analysing this? What I
am
really interested in is the gene expression, and whether it can be
shown
to be statistically related to one or more of the factors involved
(Infected, Vaccinated, Lesions) or interactions between those
factors.
Many thanks in advance
Mick
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/