Skip to content

correct function formation in R

3 messages · Omphalodes Verna, Rui Barradas, Duncan Murdoch

#
Dear list!
?
I have question of?'correct function formation'. Which function (fun1 or fun2; see below) is written more correctly? Using ''structure'' as output or creating empty ''data.frame'' and then transform it as output? (fun1 and fun1 is just for illustration).
?
Thanks a lot, OV
?
code:
input <- data.frame(x1 = rnorm(20), x2 = rnorm(20), x3 = rnorm(20))
fun1 <- function(x) {
??? ID <- NULL; minimum <- NULL; maximum <- NULL
??? for(i in seq_along(names(x)))?? {
??????? ID[i]?????? <- names(x)[i]
????????? minimum[i]? <- min(x[, names(x)[i]])
??????????? maximum[i]? <- max(x[, names(x)[i]])
??????????????????????????????????? }
??? output <- structure(list(ID, minimum, maximum), row.names = seq_along(names(x)), .Names = c("ID", "minimum", "maximum"), class = "data.frame")
??? return(output)
}
fun2 <- function(x) {
??? output <- data.frame(ID = character(), minimum = numeric(), maximum = numeric(), stringsAsFactors = FALSE)
??? for(i in seq_along(names(x)))?? {
??????? output[i, "ID"] <-names(x)[i]
??????? output[i, "minimum"]? <- min(x[, names(x)[i]])
??????? output[i, "maximum"]? <- max(x[, names(x)[i]])
??????????????????????????????????? }
??? return(output)
}

fun1(input)
fun2(input)
#
Hello,

I believe it's a matter of personal taste. I find fun2 more readable, 
others may not agree.

Rui Barradas
Em 20-11-2012 17:39, Omphalodes Verna escreveu:
#
On 20/11/2012 12:39 PM, Omphalodes Verna wrote:
fun1 above relies on the internal implementation of the data.frame 
class.  That's really unlikely to change, but you still shouldn't rely 
on it.
This one is going to be really slow, because it does so much indexing of 
the output dataframe.

I would combine the approaches:  assign to local variables in the loop 
the way fun1 does, then construct a dataframe at the end.  That is,

output <- data.frame(ID, minimum, maximum)
return(output)

One other change:  don't initialize the local variables to NULL, 
initialize them to their final size, e.g.

ID <- character(ncol(x))
minimum <- numeric(ncol(x))
maximum <- numeric(ncol(x))

(And if the contents are as simple as in the example, you don't need the 
loop, but I assume the real case is more complicated.)

Duncan Murdoch