Skip to content

Excluding data with apply

4 messages · Christian Kamenik, Stavros Macrakis, David Winsemius

#
Dear all,

'Apply' is a great thing for running functions on rows or columns of a 
matrix:

X <- rnorm(20, mean = 0, sd = 1)
dim(X) <- c(5,4)
apply(X,2,sum)

Is there a way to use apply for excluding rows or columns from a matrix 
to run functions on the remaining rows or columns? I know, I could do 
this with a 'for' loop, but 'apply' would be much easier and quicker, 
and require less programming...

Cheers, Christian
#
Using indexing and putting a minus sign in front of a vector of column  
names that you want to exclude would be a typical approach:

df <- data.frame(a=LETTERS[1:4], b= rnorm(4), c=rnorm(4), d=  
letters[5:9])

apply(df[ , -c("a","d")], 2, sum)

(Pretty sure this will run properly but don't have R up an runnign to  
test it.)
#
Well, testing would have been wise.. The last variable in the sample  
dataframe had 5 elements and the use "negation" of a character vector  
is not proper.

 > df[ , -c("a","d")]
Error in -c("a", "d") : invalid argument to unary operator

So if you limit yourself to negative indexing of numeric references to  
columns, you should be OK.

df <- data.frame(a=LETTERS[1:4], b= rnorm(4), c=rnorm(4), d=  
letters[5:8])
df[ , -c(1,4)]

            b            c
1  0.6056003 -0.002843621
2  0.3949298  0.206188106
3 -0.5362161 -1.381615740
4  0.2826662  0.016430970

 > apply(df[,-c(1,4)] , 2, sum)
          b          c
  0.7469803 -1.1618403

The subset function might also be useful if one preferred to use  
column names.

 > subset(df, select=c("b","c"))
            b            c
1  0.6056003 -0.002843621
2  0.3949298  0.206188106
3 -0.5362161 -1.381615740
4  0.2826662  0.016430970

 > apply(subset(df, select=c("b","c")), 2, sum)
          b          c
  0.7469803 -1.1618403