HI,
My earlier solutions averaged FL_EARLY values for duplicated "gene_ids" so that the resultant dataframe had unique rows.? But, if you want to keep the duplicated rows with average values, you can also try this:
dat$FL_EARLY<-unlist(lapply(lapply(split(dat,dat$gene_id),`[`,4),function(x) rep(colMeans(x),each=nrow(x))),use.names=F)
?head(dat)
#???????????? gene_id sample_1 sample_2?? FL_EARLY FL_LATE
#763938? Eucgr.A00054?? fl_S1E?? fl_S1L 13.1708000 22.2605
#763979? Eucgr.A00101?? fl_S1E?? fl_S1L? 0.3622925 14.1202
#1273243 Eucgr.A00101??? fl_S2?? fl_S1L? 0.3622925 14.1202
#764169? Eucgr.A00350?? fl_S1E?? fl_S1L? 9.0277850 43.9275
#1273433 Eucgr.A00350??? fl_S2?? fl_S1L? 9.0277850 43.9275
#1273669 Eucgr.A00650??? fl_S2?? fl_S1L 33.6691000 50.0169
A.K.
----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: "Vining, Kelly" <Kelly.Vining at oregonstate.edu>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Friday, October 12, 2012 1:10 PM
Subject: Re: [R] average duplicated rows?
Hello,
It could be a job for tapply, but I find it more suited for ?ave.
dat <- read.table(text = "
? gene_id sample_1 sample_2? FL_EARLY? FL_LATE
763938? Eucgr.A00054? fl_S1E? fl_S1L? 13.170800? 22.2605
763979? Eucgr.A00101? fl_S1E? fl_S1L? 0.367960? 14.1202
1273243 Eucgr.A00101? ? fl_S2? fl_S1L? 0.356625? 14.1202
764169? Eucgr.A00350? fl_S1E? fl_S1L? 7.381070? 43.9275
1273433 Eucgr.A00350? ? fl_S2? fl_S1L? 10.674500? 43.9275
1273669 Eucgr.A00650? ? fl_S2? fl_S1L? 33.669100? 50.0169
764480? Eucgr.A00744? fl_S1E? fl_S1L 132.429000 747.2770
1273744 Eucgr.A00744? ? fl_S2? fl_S1L 142.659000 747.2770
764595? Eucgr.A00890? fl_S1E? fl_S1L? 2.937760? 14.9647
764683? Eucgr.A00990? fl_S1E? fl_S1L? 8.681250? 48.5492
1273947 Eucgr.A00990? ? fl_S2? fl_S1L? 10.553300? 48.5492
764710? Eucgr.A01020? fl_S1E? fl_S1L? 0.000000? 57.9273
1273974 Eucgr.A01020? ? fl_S2? fl_S1L? 0.000000? 57.9273
764756? Eucgr.A01073? fl_S1E? fl_S1L? 8.504710 101.1870
1274020 Eucgr.A01073? ? fl_S2? fl_S1L? 5.400010 101.1870
764773? Eucgr.A01091? fl_S1E? fl_S1L? 3.448910? 15.7756
764826? Eucgr.A01152? fl_S1E? fl_S1L? 69.565700 198.2320
764831? Eucgr.A01158? fl_S1E? fl_S1L? 7.265640? 30.9565
764845? Eucgr.A01172? fl_S1E? fl_S1L? 3.248020? 16.9127
764927? Eucgr.A01269? fl_S1E? fl_S1L? 18.710200? 76.6918
", header = TRUE)
av <- ave(dat$FL_EARLY, dat$gene_id)
dat$FLY_EARLY <- av
Hope this helps,
Rui Barradas
Em 12-10-2012 16:41, Vining, Kelly escreveu:
Dear useRs,
I have a slightly complicated data structure and am stuck trying to extract what I need. I'm pasting an example of this data below. In some cases, there are duplicates in the "gene_id" column because there are two different "sample 1" values for a given "sample 2" value. Where these duplicates exist, I need to average the corresponding "FL_EARLY" values and retain the "FL_LATE" value and replace those two rows with a row containing the "FL_EARLY" average so that I no longer have any "gene_id" duplicates.
Seems like this is a job for some version of the apply function, but searching and puzzling over this has not gotten me anywhere. Any help will be much appreciated!
Example data:
? ? ? ? ? ? ? gene_id sample_1 sample_2? FL_EARLY? FL_LATE
763938? Eucgr.A00054? fl_S1E? fl_S1L? 13.170800? 22.2605
763979? Eucgr.A00101? fl_S1E? fl_S1L? 0.367960? 14.1202
1273243 Eucgr.A00101? ? fl_S2? fl_S1L? 0.356625? 14.1202
764169? Eucgr.A00350? fl_S1E? fl_S1L? 7.381070? 43.9275
1273433 Eucgr.A00350? ? fl_S2? fl_S1L? 10.674500? 43.9275
1273669 Eucgr.A00650? ? fl_S2? fl_S1L? 33.669100? 50.0169
764480? Eucgr.A00744? fl_S1E? fl_S1L 132.429000 747.2770
1273744 Eucgr.A00744? ? fl_S2? fl_S1L 142.659000 747.2770
764595? Eucgr.A00890? fl_S1E? fl_S1L? 2.937760? 14.9647
764683? Eucgr.A00990? fl_S1E? fl_S1L? 8.681250? 48.5492
1273947 Eucgr.A00990? ? fl_S2? fl_S1L? 10.553300? 48.5492
764710? Eucgr.A01020? fl_S1E? fl_S1L? 0.000000? 57.9273
1273974 Eucgr.A01020? ? fl_S2? fl_S1L? 0.000000? 57.9273
764756? Eucgr.A01073? fl_S1E? fl_S1L? 8.504710 101.1870
1274020 Eucgr.A01073? ? fl_S2? fl_S1L? 5.400010 101.1870
764773? Eucgr.A01091? fl_S1E? fl_S1L? 3.448910? 15.7756
764826? Eucgr.A01152? fl_S1E? fl_S1L? 69.565700 198.2320
764831? Eucgr.A01158? fl_S1E? fl_S1L? 7.265640? 30.9565
764845? Eucgr.A01172? fl_S1E? fl_S1L? 3.248020? 16.9127
764927? Eucgr.A01269? fl_S1E? fl_S1L? 18.710200? 76.6918
--Kelly V.
??? [[alternative HTML version deleted]]