Skip to content

Applying function to a TABLE and also "apply, tapply, sapply etc"

5 messages · Amelia Vettori, Liviu Andronic, Seeliger.Curt at epamail.epa.gov +1 more

#
On Wed, Dec 15, 2010 at 4:18 PM, Amelia Vettori
<amelia_vettori at yahoo.co.nz> wrote:
Say you have the following data frame
Var1 V2
1   10 20
2   40 30
3    3 11
'data.frame':	3 obs. of  2 variables:
 $ Var1: num  10 40 3
 $ V2  : num  20 30 11

Then
1  2  3
30 70 14
Var1   V2
  53   61
Only some examples that I understand well.
##apply function to each element of a list (data frames are lists)
##compute sum() for each column
$Var1
[1] 53

$V2
[1] 61

##sapply() is a variation of lapply(); see the docs
Var1   V2
  53   61

##using the 'iris' data frame, for each Species level compute mean()
of the Sepal.Length column
setosa versicolor  virginica
     5.006      5.936      6.588

##a friendlier interface is provided by by()
Species: setosa
[1] 5.006
------------------------------------------------------------
Species: versicolor
[1] 5.936
------------------------------------------------------------
Species: virginica
[1] 6.588

##the same, now for four variables at the same time
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       5.006        3.428        1.462        0.246
------------------------------------------------------------
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       5.936        2.770        4.260        1.326
------------------------------------------------------------
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       6.588        2.974        5.552        2.026


For an example of mapply see this recent post:
http://r.789695.n4.nabble.com/calculating-mean-of-list-components-tp3088986p3089057.html

For more on vectorization, see sections 3 and 4 of the 'R inferno'
[1]. Also check 'Some Hints for the R Beginner' [2].
[1] http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
[2] http://www.burns-stat.com/pages/Tutor/hints_R_begin.html

Regards
Liviu

  
    
  
#
On 12/15/2010 7:18 AM, Amelia Vettori wrote:
# a slightly more complicated demonstration function, which
# gives a result that make sense for writing to a CSV file.
fun <- function(X, Y) {
	data.frame(result=X + Y)
}

foo <- data.frame(variable_1=c(10,40,3), variable_2=c(20,30,11))

# using apply
# This only really works if the columns in foo are the same
# type because it will be transformed into a matrix (which
# is of one type). Also, since the column names of the data.frame
# don't match the arugments of fun, the unname is needed.
# do.call is a somewhat advanced function that lets you call
# a function with arguments that are stored in some other
# list.
apply(foo, 1, function(x) do.call("fun", as.list(unname(x))))

# version using apply, where foo has been transformed into
# something more like what apply would expect.
foo.m <- as.matrix(foo)
colnames(foo.m) <- c("X","Y")
apply(foo.m, 1, function(x){do.call("fun", as.list(x))})

# using lapply
# lapply takes a list, which for this looping purpose would have
# to be the row indexes of foo. This version does not reqire
# the different arguements to be the same type.
lapply(1:nrow(foo), function(i) {fun(foo[i,1],foo[i,2])})

# using mapply
# This one is more designed for when multiple arguments to a
# function are changing.
mapply(fun, foo[,1], foo[,2])

# using Vectorize
# A different approach, where instead of creating the looping
# structure, create a new function which is vectorized over its
# arguements.
fun.v <- Vectorize(fun)
fun.v(foo[,1], foo[,2])

# storing the results to disk
results <- mapply(fun, foo[,1], foo[,2])
# results is a list, each element of which is one of the returned
# sets of results corresponding to a row in the original data.frame
lapply(1:length(results), function(r) {write.csv(results[r],
file=paste("ans",r,".csv",sep=""))})

# if you didn't need different file names (the name of which depends on
# the position of the result in the list, not anything in the result
# itself), it could be simpler.
lapply(results, summary)