-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Sarah Goslee
Sent: Thursday, December 06, 2012 2:04 PM
To: Christofer Bogaso
Cc: r-help
Subject: Re: [R] Can somebody help me with following data manipulation?
If I understand what you want correctly, aggregate() should do it.
aggregate(V3 ~ V1 + V2, "mean", data=dat)
V1 V2 V3
1 C 0 0.5000000
2 G 0 1.0000000
3 I 0 0.3333333
4 O 0 1.0000000
5 R 0 0.0000000
6 T 0 0.8333333
7 I 1 0.4285714
8 O 1 0.0000000
9 R 1 0.6666667
10 T 1 0.5000000
That returns the combinations that actually exist.
If you convert V1 and V2 to factors, thus setting the possible levels,
all combinations will be returned:
dat$V1 <- factor(dat$V1)
dat$V2 <- factor(dat$V2)
aggregate(V3 ~ V1 + V2, "mean", data=dat)
V1 V2 V3
1 C 0 0.5000000
2 G 0 1.0000000
3 I 0 0.3333333
4 O 0 1.0000000
5 R 0 0.0000000
6 T 0 0.8333333
7 I 1 0.4285714
8 O 1 0.0000000
9 R 1 0.6666667
10 T 1 0.5000000
Sarah
On Thu, Dec 6, 2012 at 2:35 PM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:
Dear all, let say I have following data:
dat <- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 3L, 5L, 6L,
4L, 3L, 5L, 6L, 5L, 5L, 4L, 4L, 6L, 2L, 3L, 4L, 3L, 3L, 2L, 5L,
3L, 6L, 3L, 3L, 6L, 3L, 6L, 1L, 6L, 5L, 2L, 2L), .Label = c("C",
"G", "I", "O", "R", "T"), class = "factor"), V2 = c(0L, 0L, 0L,
1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L,
0L), V3 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("V1", "V2", "V3"), class =
"data.frame", row.names = c(NA,
-36L))
Now I want to get following kind of data frame out of that:
dat1 <- structure(list(V1 = structure(c(3L, 3L, 1L, 1L, 2L, 2L),
c("C",
"G", "I"), class = "factor"), V2 = c(0L, 1L, 0L, 1L, 0L, 1L),
V3 = c(0.333333333, 0.428571429, 0.5, NA, 1, NA)), .Names =
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))
Basically in 'dat1', the 3rd column is coming from: for 'V1 = I' &
what is the percentage of '1' for "V3" and so on.....
Is there any R function to achieve that directly?
Thanks and regards,