rowSums()
on 09/24/2008 09:06 AM Doran, Harold wrote:
Say I have the following data: testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3))
testDat
A B 1 1 NA 2 NA NA 3 3 3 rowsums() with na.rm=TRUE generates the following, which is not desired:
rowSums(testDat[, c('A', 'B')], na.rm=T)
[1] 1 0 6 rowsums() with na.rm=F generates the following, which is also not desired:
rowSums(testDat[, c('A', 'B')], na.rm=F)
[1] NA NA 6 I see why this occurs, but what I hope to have returned would be: [1] 1 NA 6 To get what I want I could do the following, but normally my ideas are bad ideas and there are codified and proper ways to do things. rr <- numeric(nrow(testDat)) for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else sum(testDat[i,], na.rm=T)
rr
[1] 1 NA 6 Is there a "proper" way to do this? In my real data, nrow is over 100,000 Thanks, Harold
The behavior you observe is documented in ?rowSums in the Value section: If there are no values in a range to be summed over (after removing missing values with na.rm = TRUE), that component of the output is set to 0 (*Sums) or NA (*Means), consistent with sum and mean. So:
sum(c(NA, NA), na.rm = TRUE)
[1] 0 As per the definition of the sum of an empty set being 0, which I got burned on myself a while back. You could feasibly use: Res <- rowSums(testDat, na.rm = TRUE) is.na(Res) <- rowSums(is.na(testDat)) == ncol(testDat) HTH, Marc Schwartz