Hello there!
I am still struggling with a binomial response over all categorical
variables (some of them with 3 levels, most with 2 levels). After
initial struggles with glm's (struggle coming from the data, not the
actual analysis) I have decided to prefer contingency tables. I have
my data such as:
response:
hunting.prev=c("success","fail","success","success","success","fail",...)
one of 21 surveyed variables:
groupsize=c("small","large","small","small","small","large"...)
...
now...
It is intuitive to me that I will have to split up each variable by
its level(s), thus creating 2 new variables for groupsize (as an
example) holding the counts of small hunting parties when the
hunting.prev was a success and so on. I could write a function to do
that for me, however, never intend to reinvent the wheel. I would like
my data to look like that:
hunting prev groupsize-small groupsize-large dogs-yes
dogs-no guns-yes guns-no...
success 12 2 4 14 23 12...
failure 1 6 34 0 12 3...
of course, hunting.prev would only be needed to create the index via
hunting.prev=="success" and is here used to indicate what each row
means. My questions would be:
a) how to count and split each categorical variable by a response
variable, how to create a 2x20something (contingency) table and how
far a prop.test() approach or a chi? may be more appropriate to
actually analyze the data.
b) how do you guys create R output so that it's formatted in nice
columns and rows?
Hope to see help,
Thanks!