using complete.cases() with nested factors
On Thu, Sep 4, 2008 at 4:19 PM, Ken Knoblauch <ken.knoblauch at inserm.fr> wrote:
Andrew Barr <wabarr <at> gmail.com> writes:
This maybe a newbie question. I have a dataframe
that looks like the sample
at the bottom of the email. I have monthly
precipitation data from several
sites over several years. For each site,
I need to extract years that have
a complete series of 12 monthly precipitation
values, while excluding that
year for sites with incomplete data.
I can't figure out how to do this
gracefully (i.e. without a silly for loop).
Any help will be appreciate,
thanks! SiteID year month precip(mm) 670090 1941 jan 2998 670090 1941 feb 1299 670090 1941 mar 1007 670090 1941 apr 354 670090 1941 may 88 670090 1941 jun 156 670090 1941 jul 8 670090 1941 aug 4 670090 1941 sep 8 670090 1941 oct 58 670090 1941 nov 397 670090 1941 dec 248 670090 1942 jan NA 670090 1942 feb 380 670090 1942 mar 797 670090 1942 apr 142 670090 1942 may 43 670090 1942 jun 14 670090 1942 jul 70 670090 1942 aug 51 670090 1942 sep 0 670090 1942 oct 10 670090 1942 nov 235 670090 1942 dec 405
There are likely more elegant solutions but this seems to work.
If the data frame is in a variable named dd
lapply(unique(dd$year), function(x) {s <- subset(dd, year == x)
if (nrow(s) == 12) s})
I think this is slightly more elegant, and follows the
split-apply-combine strategy:
years <- split(dd, dd$year)
full_years <- Filter(function(df) nrow(df) == 12, years)
do.call("cbind", full_years)
Hadley