contingency tables in R
Kurt Hornik <Kurt.Hornik at ci.tuwien.ac.at> writes:
Patrick Ball writes:
Dear List:
Most of the analysis I do involves contingency tables. I am migrating to R from Stata and I have a number of questions about using contingency tables in R. I suspect that most of the things I want to do are very short R scripts that people on this list probably have. I wonder if you would be willing to share them.
First, the presentation of tables by table() is not analysis-ready. Is there a way to output the table with the marginals, by cell, row or column proportions, with the test statistics (especially the chi^2 and the log-likelihood chi^2), residuals, cross product, and odds ratio?
Not in one monolithic function, I think, and I am not sure I would like to have such a thing, see below. But the pieces are all there: * Use margin.table() and prop.table() to obtain margins and proportions, respectively. * Use chisq.test() [in package ctest] for the chisq analysis (test statistic, p-value, chisq residuals) * Use loglin() for the LR chisq and residuals.
Most of this is in table() and chisq.test(). Here is an example from
RWeb:
twoWay <- function( x=NA, y=NA, userDefined=NA ){
if (is.na(userDefined)){
result <- chisq.test(table(x,y))
}
else{
result <- chisq.test(userDefined)
}
print (result)
observed <-result$observed
expected <- result$expected
chi.table <- ((observed - expected)^2)/expected
row.sum <- apply(observed,1,sum)
col.sum <- apply(observed,2,sum)
N <- sum(observed)
## put in the marginals and names ... create fullArray
fullArray <- cbind(observed,row.sum)
fullArray <- rbind(fullArray,c(col.sum,N))
rownames(fullArray) <- c(rownames(observed), "Total")
colnames(fullArray) <- c(colnames(observed), "Total")
## make the tables of proportions
proportion <- fullArray/N
row.proportion <- fullArray/c(row.sum,N)
col.proportion <- t(t(fullArray)/c(col.sum, N))
return(list(fA=fullArray, e=expected, ct=chi.table, p=proportion,
rp=row.proportion, cp=col.proportion))
}
This needs a nice print() method.
* Not sure about which odds ratios you want. Function mantelhaen.test() in package ctest does exact conditional ones for 2 by 2 tables.
And fisher.test(). Have a look at my intro text (http://www.myatt.demon.co.uk) for examples of calculating RRs and ORs in tabulating functions.
It really depends on how your data is set up. If you have the raw
values in a data frame, I would actually recommend using xtabs() rather
than table(). Try e.g.
data(esoph)
x <- xtabs(cbind(ncases, ncontrols) ~ ., data = esoph)
x
summary(x)
the last one prints ``useful'' summary information.
To obtain pretty-printed output from multi-way tables, use ftable().
I also like to make tables that have summary statistics of a given variable in the columns (mean, s.d., etc.) with each row being the value for a sub group of the data. How do you do this in R?
Use aggregate().
Or by().
The most complicated piece of this is contingency tables done with sample data. The sampling involves several strata with different sampling weights. Calculating the cell (or row or column) probabilities is relatively easy, but the other statistics can be complicated (the design effect, the finite population correction, the various chi^2s, and the standard errors and confidence intervals). Also, I sometimes make these tables with summary statistics in place of counts or population proportions.
Is there any way to do this stuff in R without hacking it all myself?
The pieces are all there, I think, and it should be fairly simple to combine them to reflect your personal preferences for displaying categorical information etc.
Yes, the key bits are all there. It should not take to long to get a function that meets your own needs. Mark -- Mark Myatt -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._