how to rewrite this without a loop ?
Thomas Lumley wrote:
On Thu, 18 Nov 2004, Stijn Lievens wrote:
<code>
add.fun <- function(perf.data) {
ss <- 0
for (i in 0:29) {
ss <- ss + cor(subset(perf.data, dataset == i)[3],
subset(perf.data, dataset == i)[7], method = "kendall")
}
ss }
</code>
As one can see this function uses a for-loop. Now chapter 9 of 'An
introduction to R' tells us that we should avoid for-loops as much as
possible.
You don't say whether `dataset' is the name of a column in `perf.data'.
Assuming it is, and assuming that 0:29 are all the values of `dataset'
sum(by(perf.data, list(perf.data$dataset),
function(d) cor(d[,3],d[,7], method="kendall")))
would work.
Indeed, this works. The 'by' command is exactly what I was looking for. As far as I can tell, this useful command it isn't mentioned in 'An introduction to R'.
If this is faster it will be because you don't call subset() twice per iteration, rather than because you are avoiding a loop. However it has other benefits: it doesn't have the variable `i', it doesn't have to change the value of `ss', and it doesn't have the range of `dataset' hard-coded into it. These are all clarity optimisations.
In fact I don't care too much about speed at the moment, but a one-line statement is more convenient to type (and recall) in the command line interface then a multi-line statmement. Your solution really does the trick for me. Thanks, Stijn.
-thomas