I'm teaching a categorical data analysis course this term, and a minor
"problem" has resurfaced that I have often thought about before. This
applies equally to Splus I suppose, but my undergrads aren't using
Splus.
It seems natural to read/represent a contingency table as a data
frame, with one column representing the cell counts (as in the example
appended below (data taken from Agresti, "An Introduction to
Categorical Data Analysis"). However, functions like ftable,
mantelhaen.test, chisq.test, fisher.test, etc. don't work naturally
with this representation, and instead require the user to first
manipulate the data, say by using tapply to convert the data into an
array. This is not difficult of course, but it's one of those things
that I'd rather not have to explain to students, who usually need to
be focusing on other things.
So, am I missing something obvious (not unlikely), or would it be a
good idea to extend the methods/arguments of these functions to
analyze/manipulate data represented in this way without any
preprocessing by the user? It seems that a "count" (or "weight" or
"freq" or whatever) argument would do it in most cases.
Funny, I can't help but wonder if the answer from those who have
thought about this more deeply than I have might be "it's a can of
worms".