How to force aggregate to exclude NA ?

The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle.
set.seed(143)
m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T))
m
A B  C  D
1  E I  1 NA
2  A C NA NA
3  D I NA  3
4  C I  2  4
5  A C  3  2
6  E J  1  2
7  D J  2  2
8  C G  4  1
9  C D NA  3
10 B G  3 NA
11 C B  4  2
12 A B NA NA
13 E A NA  4
14 B B  3  3
15 E I  4  1
16 E J  3  1
17 B J  4  4
18 B J  1  3
19 D D  4  2
20 B B  4  3
aggregate(m[,-c(1:2)], by=list(m[,1]), sum)
Group.1  C  D
1       A NA NA
2       B 15 NA
3       C NA 10
4       D NA  7
5       E NA NA
aggregate(m[,-c(1:2)], by=list(m[,1]), length)
Group.1 C D
1       A 3 3
2       B 5 5
3       C 4 4
4       D 3 3
5       E 5 5

My own defined version of length and sum to exclude NA
mylength <- function(x) {  sum(as.logical(x), na.rm=T) }
mysum <- function(x) {sum(x, na.rm=T)}
aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <----------------- this computes correctly.
Group.1  C  D
1       A  3  2
2       B 15 13
3       C 10 10
4       D  6  7
5       E  9  8
aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this computes correctly.
Group.1 C D
1       A 1 1
2       B 5 4
3       C 3 4
4       D 2 3
5       E 4 4

There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ?

_________________________________________________________________
[[elided Hotmail spam]]
Try

aggregate(m[, -(1:2)], m[1], sum, na.rm = TRUE)
aggregate(!is.na(m[, -(1:2)]), m[1], sum, na.rm = TRUE)

# or (this uses row names rather than a column for the group):

rowsum(m[, -(1:2)], m[,1], na.rm = TRUE)
rowsum(0+!is.na(m[, -(1:2)]), m[,1], na.rm = TRUE)
The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle.

set.seed(143)
m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T))
m
  A B  C  D
1  E I  1 NA
2  A C NA NA
3  D I NA  3
4  C I  2  4
5  A C  3  2
6  E J  1  2
7  D J  2  2
8  C G  4  1
9  C D NA  3
10 B G  3 NA
11 C B  4  2
12 A B NA NA
13 E A NA  4
14 B B  3  3
15 E I  4  1
16 E J  3  1
17 B J  4  4
18 B J  1  3
19 D D  4  2
20 B B  4  3

aggregate(m[,-c(1:2)], by=list(m[,1]), sum)
 Group.1  C  D
1       A NA NA
2       B 15 NA
3       C NA 10
4       D NA  7
5       E NA NA

aggregate(m[,-c(1:2)], by=list(m[,1]), length)
 Group.1 C D
1       A 3 3
2       B 5 5
3       C 4 4
4       D 3 3
5       E 5 5

My own defined version of length and sum to exclude NA

mylength <- function(x) {  sum(as.logical(x), na.rm=T) }
mysum <- function(x) {sum(x, na.rm=T)}

aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <----------------- this computes correctly.
 Group.1  C  D
1       A  3  2
2       B 15 13
3       C 10 10
4       D  6  7
5       E  9  8

aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this computes correctly.
 Group.1 C D
1       A 1 1
2       B 5 4
3       C 3 4
4       D 2 3
5       E 4 4

There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ?

_________________________________________________________________
[[elided Hotmail spam]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081207/e45f6cb7/attachment.pl>
aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <----------------- this computes correctly.
 Group.1  C  D
1       A  3  2
2       B 15 13
3       C 10 10
4       D  6  7
5       E  9  8

aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this computes correctly.
 Group.1 C D
1       A 1 1
2       B 5 4
3       C 3 4
4       D 2 3
5       E 4 4

There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ?
How about writing a function to do the customisation for you?

na.rm <- function(f) {
  function(x, ...) f(x[!is.na(x)], ...)
}

aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(sum))
aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(length))

Hadley
http://had.co.nz/
Actually the second aggregate and second rowsum don't need the na.rm = TRUE
so we only need:

aggregate(!is.na(m[, -(1:2)]), m[1], sum)
rowsum(0+!is.na(m[, -(1:2)]), m[,1])

You might also want to look at summaryBy in the doBy package.

On Sun, Dec 7, 2008 at 7:43 AM, Gabor Grothendieck
Try

aggregate(m[, -(1:2)], m[1], sum, na.rm = TRUE)
aggregate(!is.na(m[, -(1:2)]), m[1], sum, na.rm = TRUE)

# or (this uses row names rather than a column for the group):

rowsum(m[, -(1:2)], m[,1], na.rm = TRUE)
rowsum(0+!is.na(m[, -(1:2)]), m[,1], na.rm = TRUE)

On Sun, Dec 7, 2008 at 7:06 AM, Daren Tan <daren76 at hotmail.com> wrote:
The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle.

set.seed(143)
m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T))
m
  A B  C  D
1  E I  1 NA
2  A C NA NA
3  D I NA  3
4  C I  2  4
5  A C  3  2
6  E J  1  2
7  D J  2  2
8  C G  4  1
9  C D NA  3
10 B G  3 NA
11 C B  4  2
12 A B NA NA
13 E A NA  4
14 B B  3  3
15 E I  4  1
16 E J  3  1
17 B J  4  4
18 B J  1  3
19 D D  4  2
20 B B  4  3

aggregate(m[,-c(1:2)], by=list(m[,1]), sum)
 Group.1  C  D
1       A NA NA
2       B 15 NA
3       C NA 10
4       D NA  7
5       E NA NA

aggregate(m[,-c(1:2)], by=list(m[,1]), length)
 Group.1 C D
1       A 3 3
2       B 5 5
3       C 4 4
4       D 3 3
5       E 5 5

My own defined version of length and sum to exclude NA

mylength <- function(x) {  sum(as.logical(x), na.rm=T) }
mysum <- function(x) {sum(x, na.rm=T)}

aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <----------------- this computes correctly.
 Group.1  C  D
1       A  3  2
2       B 15 13
3       C 10 10
4       D  6  7
5       E  9  8

aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this computes correctly.
 Group.1 C D
1       A 1 1
2       B 5 4
3       C 3 4
4       D 2 3
5       E 4 4

There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ?

_________________________________________________________________
[[elided Hotmail spam]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

How to use the na.rm function outside aggregate ? I tried 

na.rm <- function(f) {
  function(x, ...) f(x[!is.na(x)], ...)
}
na.rm(sum(c(NA,1,2)))
function(x, ...) f(x[!is.na(x)], ...)
na.rm(sum, c(NA,1,2))
Error in na.rm(sum, c(NA, 1, 2)) : unused argument(s) (c(NA, 1, 2))
Date: Sun, 7 Dec 2008 07:45:14 -0600
From: h.wickham at gmail.com
To: daren76 at hotmail.com
Subject: Re: [R] How to force aggregate to exclude NA ?
CC: r-help at stat.math.ethz.ch

aggregate(m[,-c(1:2)], by=list(m[,1]), mysum) <----------------- this computes correctly.
Group.1 C D
1 A 3 2
2 B 15 13
3 C 10 10
4 D 6 7
5 E 9 8

aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this computes correctly.
Group.1 C D
1 A 1 1
2 B 5 4
3 C 3 4
4 D 2 3
5 E 4 4

There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ?
How about writing a function to do the customisation for you?

na.rm <- function(f) {
function(x, ...) f(x[!is.na(x)], ...)
}

aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(sum))
aggregate(m[,-c(1:2)], by=list(m[,1]), na.rm(length))

Hadley

-- 
http://had.co.nz/
How to use the na.rm function outside aggregate ? I tried

na.rm <- function(f) {
 function(x, ...) f(x[!is.na(x)], ...)
}

na.rm(sum(c(NA,1,2)))
function(x, ...) f(x[!is.na(x)], ...)

na.rm(sum, c(NA,1,2))
Error in na.rm(sum, c(NA, 1, 2)) : unused argument(s) (c(NA, 1, 2))
na.rm(sum)(c(NA, 1, 2))

Hadley
http://had.co.nz/