Back to formatted view
Raw Message

Message-ID: <87ip4ydote.fsf@sklar.v.cablecom.net>
Date: 2013-03-11T13:54:37Z
From: Marius Hofert
Subject: How to 'extend' a data.frame based on given variable combinations ?
In-Reply-To: <87mwuadrcj.fsf@sklar.v.cablecom.net> (Marius Hofert's message of "Mon, 11 Mar 2013 13:59:56 +0100")

... okay, I found a solution:

set.seed(1)
x <- data.frame(group = c(rep("A", 4), rep("B", 3)),
                year  = c(2001,      2003, 2004, 2005,
                                     2003, 2004, 2005),
                value = rexp(7))

tply <- as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length)),
                      nm=colnames(x)) # => 2002 missing
names(tply) <- c("group", "year", "num")
grid <- expand.grid(group = LETTERS[1:2], year=2001:2005) # all variable combinations
tply <- merge(grid, tply, by=c("group", "year"), all=TRUE) # merge the two data.frames
tply$num[is.na(tply$num)] <- 0
tply


Marius Hofert <> writes:

> Dear expeRts,
>
> I have a data.frame with certain covariate combinations ('group' and 'year')
> and corresponding values:
>
> set.seed(1)
> x <- data.frame(group = c(rep("A", 4), rep("B", 3)),
>                 year  = c(2001,      2003, 2004, 2005,
>                                      2003, 2004, 2005),
>                 value = rexp(7))
>
> My goal is essentially to construct a data.frame which contains all (group, year)
> combinations with corresponding number of values. This can easily be done with tapply():
>
> as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length))) # => 2002 missing
>
> However, the tricky part is now that I would like to have *all* years in between 2001 and 2005.
> Although tapply() sees the missing year 2001 for group "B" (since group "A" has a value there),
> tapply() does not 'see' the missing year 2002. 
>
> How can such a data.frame be constructed [ideally without using additional R packages]?
>
> Here is a straightforward way (hopelessly inefficient for the application in mind):
>
> num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0)
> covar <- c("group", "year")
> for(i in seq_len(nrow(num)))
>     num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z == num[i,covar])))
> num
>
> Cheers,
>
> Marius