generalizing expand.table: table -> data.frame
on 01/20/2009 10:38 AM Michael Friendly wrote:
In http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html a method was given for converting a frequency table to an expanded data frame representing each observation as a set of factors. A slightly modified version was later included in the NCStats package, only on http://rforge.net/ (and it has too many dependencies to be useful). I've tried to make it more general, allowing an input data frame in frequency form, and where the frequency variable is not named "Freq". This is my working version: __begin__ expand.table.R expand.table <- function (x, var.names = NULL, freq="Freq", ...) { # allow: a table object, or a data frame in frequency form if(inherits(x,"table")) { x <- as.data.frame.table(x) } ## This fails: # df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ], simplify = FALSE) # df <- subset(do.call("rbind", df), select = -freq) # This works, when the frequency variable is named Freq df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ], simplify = FALSE) df <- subset(do.call("rbind", df), select = -Freq) for (i in 1:ncol(df)) { df[[i]] <- type.convert(as.character(df[[i]]), ...) } rownames(df) <- NULL if (!is.null(var.names)) { if (length(var.names) < dim(df)[2]) stop("Too few var.names given.") else if (length(var.names) > dim(df)[2]) stop("Too many var.names given.") else names(df) <- var.names } df } __end__ expand.table.R Thus for the following table library(vcd) art <- xtabs(~Treatment + Improved, data = Arthritis)
art
Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21 expand.table (above) gives a data frame of sum(art)=84 observations, with factors Treatment and Improved.
artdf <- expand.table(art) str(artdf)
'data.frame': 84 obs. of 2 variables: $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1 ... $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
I've generalized this so it works with data frames in frequency form,
as.data.frame(art)
Treatment Improved Freq 1 Placebo None 29 2 Treated None 13 3 Placebo Some 7 4 Treated Some 7 5 Placebo Marked 7 6 Treated Marked 21
art.df2 <- expand.table(as.data.frame(art)) str(art.df2)
'data.frame': 84 obs. of 2 variables: $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1 ... $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
But--- here's the rub --- when the Freq variable in a data frame is called something other than "Freq", as in this example,
GSS
sex party count 1 female dem 279 2 male dem 165 3 female indep 73 4 male indep 47 5 female rep 225 6 male rep 191 all the changes I've tried, using the freq= argument in expand.table() fail in various ways. Can someone help?
Hi Michael,
I think that the following modifications to my original code, also
incorporating the changes made in the NCstats package should work.
expand.dft <- function(x, var.names = NULL, freq = "Freq", ...)
{
# allow: a table object, or a data frame in frequency form
if(inherits(x, "table"))
x <- as.data.frame.table(x, responseName = freq)
freq.col <- which(colnames(x) == freq)
if (length(freq.col) == 0)
stop(paste(sQuote("freq"), "not found in column names"))
DF <- sapply(1:nrow(x),
function(i) x[rep(i, each = x[i, freq.col]), ],
simplify = FALSE)
DF <- do.call("rbind", DF)[, -freq.col]
for (i in 1:ncol(DF))
{
DF[[i]] <- type.convert(as.character(DF[[i]]), ...)
}
rownames(DF) <- NULL
if (!is.null(var.names))
{
if (length(var.names) < dim(DF)[2])
{
stop(paste("Too few", sQuote("var.names"), "given."))
} else if (length(var.names) > dim(DF)[2]) {
stop(paste("Too many", sQuote("var.names"), "given."))
} else {
names(DF) <- var.names
}
}
DF
}
art
Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21
head(expand.dft(art), 10)
Treatment Improved 1 Placebo None 2 Placebo None 3 Placebo None 4 Placebo None 5 Placebo None 6 Placebo None 7 Placebo None 8 Placebo None 9 Placebo None 10 Placebo None art.dft <- as.data.frame.table(art)
art.dft
Treatment Improved Freq 1 Placebo None 29 2 Treated None 13 3 Placebo Some 7 4 Treated Some 7 5 Placebo Marked 7 6 Treated Marked 21 names(art.dft)[3] <- "count"
art.dft
Treatment Improved count 1 Placebo None 29 2 Treated None 13 3 Placebo Some 7 4 Treated Some 7 5 Placebo Marked 7 6 Treated Marked 21
head(expand.dft(art.dft, freq = "count"), 10)
Treatment Improved 1 Placebo None 2 Placebo None 3 Placebo None 4 Placebo None 5 Placebo None 6 Placebo None 7 Placebo None 8 Placebo None 9 Placebo None 10 Placebo None HTH, Marc Schwartz