Help with analysis of variance
Andrew, I plotted your data first. Then I made id, spell, year into factors and did the ANOVA.
From the graph, it looks like the two ids who had more than one spell
show variability in rate.
The ANOVA table agrees by showing high significance for the id:spell
interaction.
The numbers in this ANOVA table are not similar to your numbers.
Rich
mobility$id <- factor(mobility$id)
mobility$spell <- factor(mobility$spell)
mobility$year <- factor(mobility$year)
xyplot(rate ~ year | id, group=spell, data=mobility,
pch=19, layout=c(1,5), scales=list(alternating=FALSE),
between=list(y=1),
strip=FALSE, strip.left=strip.custom(strip.names=c(TRUE,TRUE)))
mobility.aov <- aov(rate ~ year + id/spell, data=mobility)
anova(mobility.aov)
On Fri, Aug 22, 2014 at 7:51 AM, McCulloch, Andrew
<A.Mcculloch at leedsmet.ac.uk> wrote:
Hi,
I have an observational dataset which consists of eight annual observations (year) for children (id) recording the rate of unemployment in the neighbourhood in which they lived (rate). I know if children move home so the data also has an identifier for spells in the same neighbourhood (spell). I want to decompose the overall variation in children's experience of area unemployment, given by the sum of (rate - mean rate)^2, into a) the component within a residential spell, sum of (rate - spell mean of rate)^2, b) the component between spells, sum of (spell mean), and c) the component between children, sum of (rate - mean rate for child). I think I can do this longhand using the calculations below:
mobility <- structure(list(year = c(2002L, 2003L, 2004L, 2005L, 2006L, 2007L,2008L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L,
2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2002L, 2003L, 2004L, 2005L, 2006L,
2007L, 2008L), rate = c(13.08962, 14.27165, 4.496403, 3.89839,
4.60199, 5.138746, 5.251025, 4.874652, 5.880996, 5.813953, 6.204044,
6.93802, 6.866853, 7.614808, 4.405841, 4.826733, 4.760742, 3.762136,
4.60199, 5.138746, 5.251025, 4.405841, 4.826733, 4.760742, 3.762136,
4.60199, 5.138746, 5.251025, 4.405841, 5.789474, 5.889423, 4.61211,
4.642526, 6.838906, 9.683488), spell = c(1L, 2L, 2L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), id = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
5L, 5L)), .Names = c("year", "rate", "spell", "id"), row.names = c(NA,
-35L), class = "data.frame")
mobility$id <- factor(mobility$id)
mobility$spell <- factor(mobility$spell)
mobility$spellmean <- ave(mobility$rate,mobility$id,mobility$spell,FUN=mean)
mobility$personmean <- ave(mobility$rate,mobility$id,FUN=mean)
mobility$totalmean <- mean(mobility$rate,na.rm=TRUE)
N <- dim(mobility)[1]
# observation deviation from overall mean
sum(((mobility$rate-mobility$totalmean)^2)/N)
5.159846
# observation deviation from spell mean
sum(((mobility$rate-mobility$spellmean)^2)/N)
2.039461
# deviation of spell mean from person mean
sum(((mobility$spellmean-mobility$personmean)^2)/N)
2.13787
# deviation of person mean from overall mean
sum(((mobility$personmean-mobility$totalmean)^2)/N)
0.982515
I think this is correct because the sum of the three components of variation sums to the total:
2.039461+2.13787+0.982515 = 5.159846
Can someone show me how to use the analysis of variance functions in R to get the same result. Thanks.
Andrew McCulloch
Leeds Metropolitan University
From 22 September 2014 Leeds Metropolitan University will become Leeds Beckett University.
Find out more at http://www.leedsbeckett.ac.uk To view the terms under which this email is distributed, please go to:- http://www.leedsmet.ac.uk/email-disclaimer.htm [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.