Skip to content
Back to formatted view

Raw Message

Message-ID: <43679E80-D068-4E31-B182-DC682753DDD2@comcast.net>
Date: 2011-01-11T23:56:13Z
From: David Winsemius
Subject: aggregate.formula implicitly removes rows containing NA
In-Reply-To: <3807574F-51A7-4170-BD5D-559B6B26A1BA@carnegielearning.com>

On Jan 11, 2011, at 5:41 PM, Dickison, Daniel wrote:

> The documentation for `aggregate` makes it sound like  
> aggregate.formula should behave identically to aggregate.data.frame  
> (apart from the way the parameters are passed).  But it looks like  
> aggregate.formula is quietly removing rows where any of the "output"  
> variables (those on the LHS of the formula) are NA.  This differs  
> from how aggregate.data.frame works.  Is this expected behavior?
>
> Here are a couple of examples:
>
>> d <- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3))
>> aggregate(d["b"], d["a"], mean)
>  a   b
> 1 1 1.5
> 2 2  NA
>> aggregate(b ~ a, d, mean)
>  a   b
> 1 1 1.5
> 2 2 3.0
>
> It's removing whole rows even if just one of the columns is NA, i.e.:
>
>> d <- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3),
> +                 c=c(NA,2,3,NA))
>> aggregate(cbind(b,c) ~ a, d, mean)
>  a b c
> 1 1 2 2
>

The help page for aggregate gives the calling defaults for  
aggregate.formula as:
## S3 method for class 'formula' aggregate(formula, data, FUN, ...,  
subset, na.action = na.omit)
So the description you give seems to be adhering to what I would have  
expected (had I initially read the help page.)
-- 
David Winsemius, MD
West Hartford, CT