Crosstabbing multiple response data
--- John Kane <jrkrideau at yahoo.ca> wrote:
Thanks to everyone for this. I was looking at the same problem last night and just was going to write a posting to R-help when I saw this. --- Michael Wexler <wexler at yahoo.com> wrote:
Thanks to Charles, Gabor, and a private message
from
Frank E Harrell with some good ideas and help.
This
crossprod approach was very clever, I would never have thought of it. Best, Michael ----- Original Message ---- From: Charles C. Berry <cberry at tajo.ucsd.edu> To: Michael Wexler <wexler at yahoo.com> Cc: r-help at stat.math.ethz.ch Sent: Thursday, February 22, 2007 1:17:44 PM Subject: Re: [R] Crosstabbing multiple response
data
res <- crossprod( as.matrix( ratings[ , -1] ) ) diag(res) <- "" print(res, quote=F)
att1 att2 att3 att1 2 1 att2 2 2 att3 1 2
res2 <- crossprod(as.matrix( ratings[ , -1])) *
100 / nrow( ratings )
res2[] <- paste( res2, "%", sep="" ) diag(res2) <- "" print(res2, quote=F)
att1 att2 att3 att1 50% 25% att2 50% 50% att3 25% 50%
Be sure to bone up on format and sprintf before taking this into production. On Thu, 22 Feb 2007, Michael Wexler wrote:
Using R version 2.4.1 (2006-12-18) on Windows, I
have a dataset which resembles this:
id att1 att2 att3 1 1 1 0 2 1 0 0 3 0 1 1 4 1 1 1 ratings <- data.frame(id = c(1,2,3,4), att1 =
c(1,1,0,1), att2 = c(1,0,0,1), att3 = c(0,1,1,1))
I would like to get a cross tab of counts of
co-ocurrence, which might resemble this:
att1 att2 att3 att1 2 1 att2 2 2 att3 1 2 with the hope of understanding, at least
pairwise,
what things "hang together". (Yes, there are
much,
much better ways to do this statistically
including
clustering and binary corrected correlation, but
the
audience I am working with asked for this version for a specific reason.)
(Later on, I would also like to convert to
percentages of the total unique pop, so the final version of the table would be
att1 att2 att3 att1 50% 25% att2 50% 50% att3 25% 50% But I can do this in excel if I can get the
first
table out.)
I have tried the reshape library, but could not
get anything resembling this (both on its own, as well as feeding in to table()). (I have also
played
with transposing and using some comments from this list from 2002 and 2004, but the questioners
appear
to assume more knowledge than I have in use of R; the example in the posting guide was also more complex than I was ready for, I'm afraid.)
Sample of some of my efforts:
library(reshape)
melt(ratings,id=c("id"))
ds1 <- melt(ratings,id=c("id"))
table(ds1$variable, ds1$variable) # returns only
rowcounts, 3 along diagonal
xtabs(formula = value ~ ds1$variable +
ds1$variable , data=ds1) # returns only a single
row
of collapsed counts, appears to not allow 1
variable
in multiple uses
I suspect I am close, so any nudges in the right
direction would be helpful.
Thanks much, Michael PS: www.rseek.org is very impressive, I heartily
encourage its use.
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
__________________________________________________ Do You Yahoo!?
protection around http://mail.yahoo.com