Can somebody help me with following data manipulation? - R-help

Thu, Dec 6, 2012 11:35 AM #

Dear all, let say I have following data:

dat <- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 3L, 5L, 6L, 6L,
4L, 3L, 5L, 6L, 5L, 5L, 4L, 4L, 6L, 2L, 3L, 4L, 3L, 3L, 2L, 5L,
3L, 6L, 3L, 3L, 6L, 3L, 6L, 1L, 6L, 5L, 2L, 2L), .Label = c("C",
"G", "I", "O", "R", "T"), class = "factor"), V2 = c(0L, 0L, 0L,
1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L,
0L), V3 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("V1", "V2", "V3"), class = 
"data.frame", row.names = c(NA,
-36L))

Now I want to get following kind of data frame out of that:

dat1 <- structure(list(V1 = structure(c(3L, 3L, 1L, 1L, 2L, 2L), .Label 
= c("C",
"G", "I"), class = "factor"), V2 = c(0L, 1L, 0L, 1L, 0L, 1L),
     V3 = c(0.333333333, 0.428571429, 0.5, NA, 1, NA)), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))

Basically in 'dat1', the 3rd column is coming from: for 'V1 = I' & 'V2 = 
0' what is the percentage of '1' for "V3" and so on.....

Is there any R function to achieve that directly?

Thanks and regards,

Sarah Goslee

Thu, Dec 6, 2012 12:03 PM #

If I understand what you want correctly, aggregate() should do it.

V1 V2        V3
1   C  0 0.5000000
2   G  0 1.0000000
3   I  0 0.3333333
4   O  0 1.0000000
5   R  0 0.0000000
6   T  0 0.8333333
7   I  1 0.4285714
8   O  1 0.0000000
9   R  1 0.6666667
10  T  1 0.5000000

That returns the combinations that actually exist.

If you convert V1 and V2 to factors, thus setting the possible levels,
all combinations will be returned:

V1 V2        V3
1   C  0 0.5000000
2   G  0 1.0000000
3   I  0 0.3333333
4   O  0 1.0000000
5   R  0 0.0000000
6   T  0 0.8333333
7   I  1 0.4285714
8   O  1 0.0000000
9   R  1 0.6666667
10  T  1 0.5000000

Sarah

On Thu, Dec 6, 2012 at 2:35 PM, Christofer Bogaso

<bogaso.christofer at gmail.com> wrote:

Thomas Stewart

Thu, Dec 6, 2012 12:17 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121206/4db2606c/attachment.pl>

David L Carlson

Thu, Dec 6, 2012 12:42 PM #

Converting to factors does not get all combinations.

V1 V2        V3
1   C  0 0.5000000
2   C  1        NA
3   G  0 1.0000000
4   G  1        NA
5   I  0 0.3333333
6   I  1 0.4285714
7   O  0 1.0000000
8   O  1 0.0000000
9   R  0 0.0000000
10  R  1 0.6666667
11  T  0 0.8333333
12  T  1 0.5000000

But the OP's dat1 contains only 6 observations.
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Sarah Goslee
Sent: Thursday, December 06, 2012 2:04 PM
To: Christofer Bogaso
Cc: r-help
Subject: Re: [R] Can somebody help me with following data manipulation?

If I understand what you want correctly, aggregate() should do it.

aggregate(V3 ~ V1 + V2, "mean", data=dat)

   V1 V2        V3
1   C  0 0.5000000
2   G  0 1.0000000
3   I  0 0.3333333
4   O  0 1.0000000
5   R  0 0.0000000
6   T  0 0.8333333
7   I  1 0.4285714
8   O  1 0.0000000
9   R  1 0.6666667
10  T  1 0.5000000

That returns the combinations that actually exist.

If you convert V1 and V2 to factors, thus setting the possible levels,
all combinations will be returned:

dat$V1 <- factor(dat$V1)
dat$V2 <- factor(dat$V2)
aggregate(V3 ~ V1 + V2, "mean", data=dat)

   V1 V2        V3
1   C  0 0.5000000
2   G  0 1.0000000
3   I  0 0.3333333
4   O  0 1.0000000
5   R  0 0.0000000
6   T  0 0.8333333
7   I  1 0.4285714
8   O  1 0.0000000
9   R  1 0.6666667
10  T  1 0.5000000

Sarah

On Thu, Dec 6, 2012 at 2:35 PM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:

Dear all, let say I have following data:

dat <- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 3L, 5L, 6L,

6L,

4L, 3L, 5L, 6L, 5L, 5L, 4L, 4L, 6L, 2L, 3L, 4L, 3L, 3L, 2L, 5L,
3L, 6L, 3L, 3L, 6L, 3L, 6L, 1L, 6L, 5L, 2L, 2L), .Label = c("C",
"G", "I", "O", "R", "T"), class = "factor"), V2 = c(0L, 0L, 0L,
1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L,
0L), V3 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("V1", "V2", "V3"), class =
"data.frame", row.names = c(NA,
-36L))

Now I want to get following kind of data frame out of that:

dat1 <- structure(list(V1 = structure(c(3L, 3L, 1L, 1L, 2L, 2L),

.Label =

c("C",
"G", "I"), class = "factor"), V2 = c(0L, 1L, 0L, 1L, 0L, 1L),
    V3 = c(0.333333333, 0.428571429, 0.5, NA, 1, NA)), .Names =

c("V1",

"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))

Basically in 'dat1', the 3rd column is coming from: for 'V1 = I' &

'V2 = 0'

what is the percentage of '1' for "V3" and so on.....

Is there any R function to achieve that directly?

Thanks and regards,

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

Thu, Dec 6, 2012 12:47 PM #

Hi,

You can also use
library(plyr)
ddply(dat,.(V1,V2),summarise,V3=mean(V3),.drop=FALSE)
#?? V1 V2??????? V3
#1?? C? 0 0.5000000
#2?? C? 1?????? NaN
#3?? G? 0 1.0000000
#4?? G? 1?????? NaN
#5?? I? 0 0.3333333
#6?? I? 1 0.4285714
#7?? O? 0 1.0000000
#8 ? O? 1 0.0000000
#9?? R? 0 0.0000000
#10? R? 1 0.6666667
#11? T? 0 0.8333333
#12? T? 1 0.5000000
A.K.



----- Original Message -----
From: Thomas Stewart <tgs.public.mail at gmail.com>
To: 
Cc: r-help <r-help at r-project.org>
Sent: Thursday, December 6, 2012 3:17 PM
Subject: Re: [R] Can somebody help me with following data manipulation?

You can directly use the tapply function.
-tgs

tapply(dat[,3],dat[,-3],mean)

On Thu, Dec 6, 2012 at 3:03 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:

If I understand what you want correctly, aggregate() should do it.

aggregate(V3 ~ V1 + V2, "mean", data=dat)

? ? V1 V2? ? ? ? V3
1?  C? 0 0.5000000
2?  G? 0 1.0000000
3?  I? 0 0.3333333
4?  O? 0 1.0000000
5?  R? 0 0.0000000
6?  T? 0 0.8333333
7?  I? 1 0.4285714
8?  O? 1 0.0000000
9?  R? 1 0.6666667
10? T? 1 0.5000000

That returns the combinations that actually exist.

If you convert V1 and V2 to factors, thus setting the possible levels,
all combinations will be returned:

dat$V1 <- factor(dat$V1)
dat$V2 <- factor(dat$V2)
aggregate(V3 ~ V1 + V2, "mean", data=dat)

? ? V1 V2? ? ? ? V3
1?  C? 0 0.5000000
2?  G? 0 1.0000000
3?  I? 0 0.3333333
4?  O? 0 1.0000000
5?  R? 0 0.0000000
6?  T? 0 0.8333333
7?  I? 1 0.4285714
8?  O? 1 0.0000000
9?  R? 1 0.6666667
10? T? 1 0.5000000

Sarah

On Thu, Dec 6, 2012 at 2:35 PM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:

Dear all, let say I have following data:

dat <- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 3L, 5L, 6L, 6L,
4L, 3L, 5L, 6L, 5L, 5L, 4L, 4L, 6L, 2L, 3L, 4L, 3L, 3L, 2L, 5L,
3L, 6L, 3L, 3L, 6L, 3L, 6L, 1L, 6L, 5L, 2L, 2L), .Label = c("C",
"G", "I", "O", "R", "T"), class = "factor"), V2 = c(0L, 0L, 0L,
1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L,
0L), V3 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("V1", "V2", "V3"), class =
"data.frame", row.names = c(NA,
-36L))

Now I want to get following kind of data frame out of that:

dat1 <- structure(list(V1 = structure(c(3L, 3L, 1L, 1L, 2L, 2L), .Label =
c("C",
"G", "I"), class = "factor"), V2 = c(0L, 1L, 0L, 1L, 0L, 1L),
? ?  V3 = c(0.333333333, 0.428571429, 0.5, NA, 1, NA)), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))

Basically in 'dat1', the 3rd column is coming from: for 'V1 = I' & 'V2 =

0'

what is the percentage of '1' for "V3" and so on.....

Is there any R function to achieve that directly?

Thanks and regards,

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.