Skip to content
Prev 307838 / 398503 Next

average duplicated rows?

HI,

My earlier solutions averaged FL_EARLY values for duplicated "gene_ids" so that the resultant dataframe had unique rows.? But, if you want to keep the duplicated rows with average values, you can also try this:
dat$FL_EARLY<-unlist(lapply(lapply(split(dat,dat$gene_id),`[`,4),function(x) rep(colMeans(x),each=nrow(x))),use.names=F)
?head(dat)
#???????????? gene_id sample_1 sample_2?? FL_EARLY FL_LATE
#763938? Eucgr.A00054?? fl_S1E?? fl_S1L 13.1708000 22.2605
#763979? Eucgr.A00101?? fl_S1E?? fl_S1L? 0.3622925 14.1202
#1273243 Eucgr.A00101??? fl_S2?? fl_S1L? 0.3622925 14.1202
#764169? Eucgr.A00350?? fl_S1E?? fl_S1L? 9.0277850 43.9275
#1273433 Eucgr.A00350??? fl_S2?? fl_S1L? 9.0277850 43.9275
#1273669 Eucgr.A00650??? fl_S2?? fl_S1L 33.6691000 50.0169
A.K.





----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: "Vining, Kelly" <Kelly.Vining at oregonstate.edu>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Friday, October 12, 2012 1:10 PM
Subject: Re: [R] average duplicated rows?

Hello,

It could be a job for tapply, but I find it more suited for ?ave.


dat <- read.table(text = "
? gene_id sample_1 sample_2?  FL_EARLY? FL_LATE
763938? Eucgr.A00054?  fl_S1E?  fl_S1L? 13.170800? 22.2605
763979? Eucgr.A00101?  fl_S1E?  fl_S1L?  0.367960? 14.1202
1273243 Eucgr.A00101? ? fl_S2?  fl_S1L?  0.356625? 14.1202
764169? Eucgr.A00350?  fl_S1E?  fl_S1L?  7.381070? 43.9275
1273433 Eucgr.A00350? ? fl_S2?  fl_S1L? 10.674500? 43.9275
1273669 Eucgr.A00650? ? fl_S2?  fl_S1L? 33.669100? 50.0169
764480? Eucgr.A00744?  fl_S1E?  fl_S1L 132.429000 747.2770
1273744 Eucgr.A00744? ? fl_S2?  fl_S1L 142.659000 747.2770
764595? Eucgr.A00890?  fl_S1E?  fl_S1L?  2.937760? 14.9647
764683? Eucgr.A00990?  fl_S1E?  fl_S1L?  8.681250? 48.5492
1273947 Eucgr.A00990? ? fl_S2?  fl_S1L? 10.553300? 48.5492
764710? Eucgr.A01020?  fl_S1E?  fl_S1L?  0.000000? 57.9273
1273974 Eucgr.A01020? ? fl_S2?  fl_S1L?  0.000000? 57.9273
764756? Eucgr.A01073?  fl_S1E?  fl_S1L?  8.504710 101.1870
1274020 Eucgr.A01073? ? fl_S2?  fl_S1L?  5.400010 101.1870
764773? Eucgr.A01091?  fl_S1E?  fl_S1L?  3.448910? 15.7756
764826? Eucgr.A01152?  fl_S1E?  fl_S1L? 69.565700 198.2320
764831? Eucgr.A01158?  fl_S1E?  fl_S1L?  7.265640? 30.9565
764845? Eucgr.A01172?  fl_S1E?  fl_S1L?  3.248020? 16.9127
764927? Eucgr.A01269?  fl_S1E?  fl_S1L? 18.710200? 76.6918
", header = TRUE)

av <- ave(dat$FL_EARLY, dat$gene_id)
dat$FLY_EARLY <- av


Hope this helps,

Rui Barradas
Em 12-10-2012 16:41, Vining, Kelly escreveu:
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.