Skip to content

fisher.test - can I use non-integer expected values?

9 messages · bakerwl, David Winsemius, Peter Langfelder +1 more

#
I seem to be able to use expected values that are decimal (e.g., 1.33) when
using chisq.test but not when using fisher.test. This happens when using an
array/matrix as input. Fisher.test returns: Error in sprintf(gettext(fmt,
domain = domain), ...) : invalid format '%d'; use format %s for character
objects.

Thus, it appears fisher.test is looking for integers only.

I tried putting the data in x and y factor objects, but that does not work
either.

Is there another way to use non-integer expected values with fisher.test or
is that a limitation of fisher.test?

If I must use integer expected values, I suppose one option would be round
the expected value down or up to an integer. But, which? I tried that, but
they produce different p values.

Thanks for any help!



--
View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976.html
Sent from the R help mailing list archive at Nabble.com.
#
On Dec 10, 2013, at 2:04 PM, bakerwl wrote:

            
There are no expected values in the input to fisher.test.
That would seem to be a very reasonable assumption.
Well, of course. First, you tell us why you need `fisher.test` at all. It says very clearly it is for count data and you clearly want to do something with input that is not counts. `prop.test` will test a distribution of counts against expected proportions and `binom.test` will do an exact test of a Bernoulli experiment against (one) proportion.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
David,

Thanks for your reply--I appreciate your thoughts. I will look at prop.test.

The reason I chose fisher.test over chisq.test is that fisher.test is more
appropriate when observed counts are not numerous--empty cells and cells
with counts < 5 are less a problem. 

Expected values are needed to test a null hypothesis against observed
counts, but if total observed counts are 20 for 3 categories, then a null
hypothesis of a random effect would use expected values = 6.67 in each of
the 3 categories (20/3). 

Yes, fisher.test is for count data and so is chisq.test, but chisq.test
allows 6.67 to be input as expected values in each of 3 categories, while
fisher.test does not seem to allow this? 

I don't think it is inherent in Fisher's exact test itself that expected
values must be integers, but not sure.





--
View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4681989.html
Sent from the R help mailing list archive at Nabble.com.
#
On Dec 10, 2013, at 6:55 PM, bakerwl wrote:

            
I see it differently, although I could be further educated on the subject and I've been wrong on Rhelp before. I think it _is_ inherent in Fisher's Exact Test. FET is essentially a permutation test built on the hypergeometric distribution (a discrete distribution)  and it is unclear what to do with 1.33 of an entity under conditions of permutation.

The "chi-square test" (one of many so-called chi-square tests) is a pretty good approximation to the discrete counterparts despite the fact that the chi-square distribution takes continuous arguments and generally holds well down to expected counts of 5. The link between the chi-square and binomial distributions is through there variances: npq vs sum(o-e)^2/n. You can develop arguments "in the limit" that converge fairly quickly.
David Winsemius
Alameda, CA, USA
#
On Dec 10, 2013, at 8:21 PM, David Winsemius wrote:

            
I was careless there, both in the spelling of 'their' and in the connection of chi-square distributions to binomial. You should consult more authoritative source for the mathematics of similarities in their large sample features.
#
On Tue, Dec 10, 2013 at 6:55 PM, bakerwl <bakerwl at uwyo.edu> wrote:

            
To the best of my knowledge (which may be limited) you never put
expected counts as input in Fisher Exact Test, you need to put actual
observed counts. Fisher test tests the independence of two different
random variables, each of which has a set of categorical outcomes.
that can take 3 different values, and you want a statistical test for
whether the frequencies are the same. You can use chisq.test for this
by specifying the probabilities (argument p) and running it as a
goodness-of-fit test. I am not aware of goodness-of-fit way of using
fisher.test.

If you actually have two different variables, one of which can take
two values and the other one can take 3 values, you need the actual
observed counts for each of the 6 combinations of the two variables.
You put these counts into a 2x3 table and supply that to fisher.test
or chisq.test.
I think it is inherent in Fisher's Exact test. The test makes certain
assumptions about the distribution of the numbers you put in. If you
put in non-integers, you necessarily  violate those assumptions and
the test is then not applicable.

Peter
#
On 11 Dec 2013, at 06:37 , Peter Langfelder <peter.langfelder at gmail.com> wrote:

            
A couple of additional notes: 

(a) If you think you can feed expected values like 6.67 to chisq.test anywhere, I think you are doing it wrong. It might give you an answer, but not likely a correct one.

(b) There is an exact test for equidistribution or goodness of fit in general, but that is not what fisher.test does. You can "cheat" and get an approximation by claiming that you are comparing your data to a much larger set of equidistributed data, e.g.:
Fisher's Exact Test for Count Data

data:  cbind(c(1, 10, 9), c(10000, 10000, 10000))
p-value = 0.01465
alternative hypothesis: two.sided

(c) It's not massively hard to generate the ~200 configurations of 20 items into 3 groups and use that to calculate the exact test exactly:

tab <- outer(0:20,0:20,
	Vectorize(function(i,j)
	  if (i+j <= 20)
              dmultinom(c(i, j, 20 - i - j), p=c(1, 1, 1)/3)
          else 0
	))
pp <- dmultinom(c(1, 10, 9), p=c(1, 1, 1)/3)
sum(tab[tab<=pp])

## [1] 0.01468422

(d) Another option is to use the simulate.p.value option to chisq.test():
Chi-squared test for given probabilities with simulated p-value (based
	on 10000 replicates)

data:  c(1, 10, 9)
X-squared = 7.3, df = NA, p-value = 0.0252

(The p-values _will_ differ because chi-square critical regions are slightly different from those based on the point probabilities.)
#
Thank you David, Peter, and Peter,

I understand now that I would be misusing fisher.test to use it for a
goodness-of-fit test and that non-integer data are inappropriate since it is
for testing two sets of observed counts.

Peter D., it does not seem like a good idea for me to "cheat" fisher.test to
produce a goodness-of-fit outcome by using a larger set of expected data or
by generating all the configurations behind the fisher.test approach.
Wouldn't it be statistically inappropriate to turn fisher.test into a
goodness-of-fit test when it was not designed for this? Perhaps this is a
statistical question, not an R question, though. 

What is the R function that does an exact test for goodness-of-fit for
categorical data with > 2 categories?





--
View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4682013.html
Sent from the R help mailing list archive at Nabble.com.
#
I think that I can answer my own question, which was which R function is
appropriate for the test I need. It looks like the EMT package and the exact
multinomial test is appropriate for goodness-of-fit to test a null
hypothesis of equal proportions, given at least 3 categories. Unless I am
wrong, I think this can end this discussion. I appreciate the help!



--
View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4682015.html
Sent from the R help mailing list archive at Nabble.com.