Skip to content

generalizing expand.table: table -> data.frame

2 messages · Michael Friendly, Marc Schwartz

#
In
http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html
a method was given for converting a frequency table to an expanded data 
frame representing each
observation as a set of factors.  A slightly modified version was later 
included in the NCStats package,
only on http://rforge.net/ (and it has too many dependencies to be useful).

I've tried to make it more general, allowing an input data frame in 
frequency form, and where
the frequency variable is not named "Freq".  This is my working version:

__begin__ expand.table.R
expand.table <- function (x, var.names = NULL, freq="Freq", ...)
{
#  allow: a table object, or a data frame in frequency form
   if(inherits(x,"table")) {
     x <- as.data.frame.table(x)
   }
##  This fails:
#   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ], 
simplify = FALSE)
#   df <- subset(do.call("rbind", df), select = -freq)

#  This works, when the frequency variable is named Freq
   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ], 
simplify = FALSE)
   df <- subset(do.call("rbind", df), select = -Freq)

   for (i in 1:ncol(df)) {
       df[[i]] <- type.convert(as.character(df[[i]]), ...)
   }
   rownames(df) <- NULL
   if (!is.null(var.names)) {
       if (length(var.names) < dim(df)[2])
           stop("Too few var.names given.")
       else if (length(var.names) > dim(df)[2])
           stop("Too many var.names given.")
       else names(df) <- var.names
   }
   df
}
__end__   expand.table.R

Thus for the following table

library(vcd)
art <- xtabs(~Treatment + Improved, data = Arthritis)


 > art
         Improved
Treatment None Some Marked
  Placebo   29    7      7
  Treated   13    7     21

expand.table (above) gives a data frame of sum(art)=84 observations, 
with factors
Treatment and Improved. 

 > artdf <- expand.table(art)
 > str(artdf)
'data.frame':   84 obs. of  2 variables:
 $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 
1 ...
 $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
 >

I've generalized this so it works with data frames in frequency form,

 > as.data.frame(art)
  Treatment Improved Freq
1   Placebo     None   29
2   Treated     None   13
3   Placebo     Some    7
4   Treated     Some    7
5   Placebo   Marked    7
6   Treated   Marked   21

 > art.df2 <- expand.table(as.data.frame(art))
 > str(art.df2)
'data.frame':   84 obs. of  2 variables:
 $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 
1 ...
 $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
 >

But--- here's the rub --- when the Freq variable in a data frame is 
called something other than
"Freq", as in this example,

 > GSS
     sex party count
1 female   dem   279
2   male   dem   165
3 female indep    73
4   male indep    47
5 female   rep   225
6   male   rep   191

all the changes I've tried, using the freq= argument in expand.table() 
fail in various ways.

Can someone help?

-Michael
#
on 01/20/2009 10:38 AM Michael Friendly wrote:
Hi Michael,

I think that the following modifications to my original code, also
incorporating the changes made in the NCstats package should work.


expand.dft <- function(x, var.names = NULL, freq = "Freq", ...)
{
  #  allow: a table object, or a data frame in frequency form
  if(inherits(x, "table"))
    x <- as.data.frame.table(x, responseName = freq)

  freq.col <- which(colnames(x) == freq)
  if (length(freq.col) == 0)
      stop(paste(sQuote("freq"), "not found in column names"))

  DF <- sapply(1:nrow(x),
               function(i) x[rep(i, each = x[i, freq.col]), ],
               simplify = FALSE)

  DF <- do.call("rbind", DF)[, -freq.col]

  for (i in 1:ncol(DF))
  {
    DF[[i]] <- type.convert(as.character(DF[[i]]), ...)

  }

  rownames(DF) <- NULL

  if (!is.null(var.names))
  {
    if (length(var.names) < dim(DF)[2])
    {
      stop(paste("Too few", sQuote("var.names"), "given."))
    } else if (length(var.names) > dim(DF)[2]) {
      stop(paste("Too many", sQuote("var.names"), "given."))
    } else {
      names(DF) <- var.names
    }
  }

  DF
}
Improved
Treatment None Some Marked
  Placebo   29    7      7
  Treated   13    7     21
Treatment Improved
1    Placebo     None
2    Placebo     None
3    Placebo     None
4    Placebo     None
5    Placebo     None
6    Placebo     None
7    Placebo     None
8    Placebo     None
9    Placebo     None
10   Placebo     None



art.dft <- as.data.frame.table(art)
Treatment Improved Freq
1   Placebo     None   29
2   Treated     None   13
3   Placebo     Some    7
4   Treated     Some    7
5   Placebo   Marked    7
6   Treated   Marked   21

names(art.dft)[3] <- "count"
Treatment Improved count
1   Placebo     None    29
2   Treated     None    13
3   Placebo     Some     7
4   Treated     Some     7
5   Placebo   Marked     7
6   Treated   Marked    21
Treatment Improved
1    Placebo     None
2    Placebo     None
3    Placebo     None
4    Placebo     None
5    Placebo     None
6    Placebo     None
7    Placebo     None
8    Placebo     None
9    Placebo     None
10   Placebo     None


HTH,

Marc Schwartz