calculate row median of every three columns for a dataframe
Some comments on the contributions:
a) for Petr's suggestion, to return the desired structure modify the
statement to
t(aggregate(t(dfr), list(idx), median)[,-1])
And, although less readable, can certainly be put in a one-liner
solution by removing the idx definition
t(aggregate(t(dfr), list((0:(ncol(dfr)-1))%/%3), median)[,-1])
b) to DMcP: "# I'm sure the cognoscenti will have a much more elegant way"
+1 for elegance (in my view)
c) to Jim: I think your code is instructive. From a style viewpoint I would
recommend against naming a local variable 'stop' :-)
Best,
Eric
On Fri, Apr 17, 2020 at 9:54 AM PIKAL Petr <petr.pikal at precheza.cz> wrote:
Hi As usual in R, things could be done by different ways. idx <- (0:(ncol(dfr)-1))%/%3 aggregate(t(dfr), list(idx), median) Group.1 V1 V2 V3 1 0 2 3 4 2 1 4 5 1 Results should be OK although its structure is different, performance is not tested. Cheers Petr
-----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of David McPearson Sent: Friday, April 17, 2020 7:50 AM To: r-help at r-project.org Cc: dcmcp at telstra.com Subject: Re: [R] calculate row median of every three columns for a
dataframe
Anna wrote:
Hi all, I need to calculate a row median for every three columns of a dataframe. I made it work using the following script, but not happy with the script. Is there a simpler way for doing this?
To which Jim L responded:
Hi Anna,
I can't think of a simple way, but this function may make you happier:
step_median<-function(x,window) {
x<-unlist(x)
stop<-length(x)-window+1
xout<-NA
nindx<-1
for(i in seq(1,stop,by=window)) {
xout[nindx]<-do.call("median",list(x[i:(i+window-1)]))
nindx<-nindx+1
}
return(xout)
}
apply(df,1,step_median,3)
This should return a matrix where the columns are the medians
calculated from blocks of "window" width on each row of "df". As Bert
noted, you may want to think about a "rolling" median where the
"windows" overlap. This can be done like so:
library(zoo)
apply(df,1,rollmedian,3)
Jim
Another approach you might try is multiple calls to sapply/lapply. This
won't
rid you of loops, but it will hide them: # Example data. Some names changed to avoid collisions between # R functions (collisions are in the gap between the headphones, # not i R). dfr <- data.frame(a = c(2,3,4), b = c(3,5,1), c = c(1,3,6), d = c(7,2,1), e = c(2,5,3), f = c(4,5,1)) # Turn each of the three-column groups into their own element # in a
list.
Note: the subsetting (probably) fails with an # error if ncol(dfr) is
not a
multiple of 3 dlist <- lapply(seq(1, ncol(dfr), by = 3), function(enn) dfr[ , enn + 0:2]) # Then you can use sapply to calculate the row medians for each # of the elements.. # Both of the following seem to work. I'm not sure which is # more
readable?
sapply(dlist, function(xx) apply(xx, 1, median)) sapply(dlist, apply, 1, median) # I'm sure the cognoscenti will have a much more elegant way # of doing
this.
Cheers y'all, DMcP
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.