I seem to be able to use expected values that are decimal (e.g., 1.33) when using chisq.test but not when using fisher.test. This happens when using an array/matrix as input. Fisher.test returns: Error in sprintf(gettext(fmt, domain = domain), ...) : invalid format '%d'; use format %s for character objects. Thus, it appears fisher.test is looking for integers only. I tried putting the data in x and y factor objects, but that does not work either. Is there another way to use non-integer expected values with fisher.test or is that a limitation of fisher.test? If I must use integer expected values, I suppose one option would be round the expected value down or up to an integer. But, which? I tried that, but they produce different p values. Thanks for any help! -- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976.html Sent from the R help mailing list archive at Nabble.com.
fisher.test - can I use non-integer expected values?
9 messages · bakerwl, David Winsemius, Peter Langfelder +1 more
On Dec 10, 2013, at 2:04 PM, bakerwl wrote:
I seem to be able to use expected values that are decimal (e.g., 1.33) when using chisq.test but not when using fisher.test.
There are no expected values in the input to fisher.test.
This happens when using an array/matrix as input. Fisher.test returns: Error in sprintf(gettext(fmt, domain = domain), ...) : invalid format '%d'; use format %s for character objects. Thus, it appears fisher.test is looking for integers only.
That would seem to be a very reasonable assumption.
I tried putting the data in x and y factor objects, but that does not work either. Is there another way to use non-integer expected values with fisher.test or is that a limitation of fisher.test?
If I must use integer expected values, I suppose one option would be round the expected value down or up to an integer. But, which? I tried that, but they produce different p values.
Well, of course. First, you tell us why you need `fisher.test` at all. It says very clearly it is for count data and you clearly want to do something with input that is not counts. `prop.test` will test a distribution of counts against expected proportions and `binom.test` will do an exact test of a Bernoulli experiment against (one) proportion.
Thanks for any help! View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
David, Thanks for your reply--I appreciate your thoughts. I will look at prop.test. The reason I chose fisher.test over chisq.test is that fisher.test is more appropriate when observed counts are not numerous--empty cells and cells with counts < 5 are less a problem. Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this? I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure. -- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4681989.html Sent from the R help mailing list archive at Nabble.com.
On Dec 10, 2013, at 6:55 PM, bakerwl wrote:
David, Thanks for your reply--I appreciate your thoughts. I will look at prop.test. The reason I chose fisher.test over chisq.test is that fisher.test is more appropriate when observed counts are not numerous--empty cells and cells with counts < 5 are less a problem. Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this? I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure.
I see it differently, although I could be further educated on the subject and I've been wrong on Rhelp before. I think it _is_ inherent in Fisher's Exact Test. FET is essentially a permutation test built on the hypergeometric distribution (a discrete distribution) and it is unclear what to do with 1.33 of an entity under conditions of permutation. The "chi-square test" (one of many so-called chi-square tests) is a pretty good approximation to the discrete counterparts despite the fact that the chi-square distribution takes continuous arguments and generally holds well down to expected counts of 5. The link between the chi-square and binomial distributions is through there variances: npq vs sum(o-e)^2/n. You can develop arguments "in the limit" that converge fairly quickly.
-- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4681989.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
On Dec 10, 2013, at 8:21 PM, David Winsemius wrote:
On Dec 10, 2013, at 6:55 PM, bakerwl wrote:
David, Thanks for your reply--I appreciate your thoughts. I will look at prop.test. The reason I chose fisher.test over chisq.test is that fisher.test is more appropriate when observed counts are not numerous--empty cells and cells with counts < 5 are less a problem. Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this? I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure.
I see it differently, although I could be further educated on the subject and I've been wrong on Rhelp before. I think it _is_ inherent in Fisher's Exact Test. FET is essentially a permutation test built on the hypergeometric distribution (a discrete distribution) and it is unclear what to do with 1.33 of an entity under conditions of permutation. The "chi-square test" (one of many so-called chi-square tests) is a pretty good approximation to the discrete counterparts despite the fact that the chi-square distribution takes continuous arguments and generally holds well down to expected counts of 5. The link between the chi-square and binomial distributions is through there variances: npq vs sum(o-e)^2/n. You can develop arguments "in the limit" that converge fairly quickly.
I was careless there, both in the spelling of 'their' and in the connection of chi-square distributions to binomial. You should consult more authoritative source for the mathematics of similarities in their large sample features.
David >> -- >> View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4681989.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA
On Tue, Dec 10, 2013 at 6:55 PM, bakerwl <bakerwl at uwyo.edu> wrote:
Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this?
To the best of my knowledge (which may be limited) you never put expected counts as input in Fisher Exact Test, you need to put actual observed counts. Fisher test tests the independence of two different random variables, each of which has a set of categorical outcomes.
From what you wrote it appears that you have only one random variable
that can take 3 different values, and you want a statistical test for whether the frequencies are the same. You can use chisq.test for this by specifying the probabilities (argument p) and running it as a goodness-of-fit test. I am not aware of goodness-of-fit way of using fisher.test. If you actually have two different variables, one of which can take two values and the other one can take 3 values, you need the actual observed counts for each of the 6 combinations of the two variables. You put these counts into a 2x3 table and supply that to fisher.test or chisq.test.
I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure.
I think it is inherent in Fisher's Exact test. The test makes certain assumptions about the distribution of the numbers you put in. If you put in non-integers, you necessarily violate those assumptions and the test is then not applicable. Peter
On 11 Dec 2013, at 06:37 , Peter Langfelder <peter.langfelder at gmail.com> wrote:
Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this?
To the best of my knowledge (which may be limited) you never put expected counts as input in Fisher Exact Test, you need to put actual observed counts. Fisher test tests the independence of two different random variables, each of which has a set of categorical outcomes.
From what you wrote it appears that you have only one random variable that can take 3 different values, and you want a statistical test for whether the frequencies are the same. You can use chisq.test for this by specifying the probabilities (argument p) and running it as a goodness-of-fit test. I am not aware of goodness-of-fit way of using fisher.test.
A couple of additional notes: (a) If you think you can feed expected values like 6.67 to chisq.test anywhere, I think you are doing it wrong. It might give you an answer, but not likely a correct one. (b) There is an exact test for equidistribution or goodness of fit in general, but that is not what fisher.test does. You can "cheat" and get an approximation by claiming that you are comparing your data to a much larger set of equidistributed data, e.g.:
fisher.test(cbind(c(1,10,9),c(10000,10000,10000)))
Fisher's Exact Test for Count Data
data: cbind(c(1, 10, 9), c(10000, 10000, 10000))
p-value = 0.01465
alternative hypothesis: two.sided
(c) It's not massively hard to generate the ~200 configurations of 20 items into 3 groups and use that to calculate the exact test exactly:
tab <- outer(0:20,0:20,
Vectorize(function(i,j)
if (i+j <= 20)
dmultinom(c(i, j, 20 - i - j), p=c(1, 1, 1)/3)
else 0
))
pp <- dmultinom(c(1, 10, 9), p=c(1, 1, 1)/3)
sum(tab[tab<=pp])
## [1] 0.01468422
(d) Another option is to use the simulate.p.value option to chisq.test():
chisq.test(c(1, 10, 9), simulate=TRUE, B=10000)
Chi-squared test for given probabilities with simulated p-value (based on 10000 replicates) data: c(1, 10, 9) X-squared = 7.3, df = NA, p-value = 0.0252 (The p-values _will_ differ because chi-square critical regions are slightly different from those based on the point probabilities.)
Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Thank you David, Peter, and Peter, I understand now that I would be misusing fisher.test to use it for a goodness-of-fit test and that non-integer data are inappropriate since it is for testing two sets of observed counts. Peter D., it does not seem like a good idea for me to "cheat" fisher.test to produce a goodness-of-fit outcome by using a larger set of expected data or by generating all the configurations behind the fisher.test approach. Wouldn't it be statistically inappropriate to turn fisher.test into a goodness-of-fit test when it was not designed for this? Perhaps this is a statistical question, not an R question, though. What is the R function that does an exact test for goodness-of-fit for categorical data with > 2 categories? -- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4682013.html Sent from the R help mailing list archive at Nabble.com.
I think that I can answer my own question, which was which R function is appropriate for the test I need. It looks like the EMT package and the exact multinomial test is appropriate for goodness-of-fit to test a null hypothesis of equal proportions, given at least 3 categories. Unless I am wrong, I think this can end this discussion. I appreciate the help! -- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4682015.html Sent from the R help mailing list archive at Nabble.com.