-----Original Message-----
From: bioc-devel-bounces at stat.math.ethz.ch [mailto:bioc-devel-
bounces at stat.math.ethz.ch] On Behalf Of Wolfgang Huber
Sent: Thursday, July 16, 2009 3:10 AM
To: Bioconductor Developers
Subject: [Bioc-devel] rfc - rowttests in genefilter package
Hi,
I noted in this function (which I wrote) that if the number of samples
in each group is large (more than, say, 1000), floating point errors
become significant, to the point of invalidating the results.
Essentially, the reason is that I compute the within group variances
via
ss - s * s / n
where ss is the sum of squared values, s is the sum of values, and n
the sample size [1].
I've added a warning to the man page asking people only to use the
function when the number of samples is dozens to a few hundred. I can
think of a few obvious ways to make the code less vulnerable to the
finite precision of floating point arithmetic, but I am sure this
problem has been solved many times before and would like to ask for
pointers or suggestions.
Best wishes
Wolfgang
[1]
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/genefilter/
src/rowttests.c
-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber