"CV" for log normal data

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120221/da78b79c/attachment.pl>
Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?

You may want to ask this question in the bioconductor list since it
isn't really an R question.

Do you also have some sort of an expression p-value? If you only have
expression itself, you could simply look at variance and hope that
non-expressed genes have expression values determined chiefly by noise
which varies quite a bit, so they would have a higher variance than
genes with stable expression higher than the typical noise.

HTH,

Peter
Inline below.

On Tue, Feb 21, 2012 at 2:07 PM, Peter Langfelder
On Tue, Feb 21, 2012 at 1:44 PM, array chip <arrayprofile at yahoo.com> wrote:
Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?

You may want to ask this question in the bioconductor list since it
isn't really an R question.
Good advice. But perhaps ?mad or some other perhaps robust plain old
measure of spread?
-- Bert
Do you also have some sort of an expression p-value? If you only have
expression itself, you could simply look at variance and hope that
non-expressed genes have expression values determined chiefly by noise
which varies quite a bit, so they would have a higher variance than
genes with stable expression higher than the typical noise.

HTH,

Peter

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Good advice. But perhaps ?mad or some other perhaps robust plain old
measure of spread?
The problem is not (lack of) robustness to outliers, the problem is to
find genes whose expression variation is small compared to (mean)
expression. Trouble is, Agilent throws the mean expression information
away, so you have to find heuristic workarounds. I have encountered
the same issue before and haven't really found a good solution.

Peter

Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?
What's wrong with the SD of log(X)?? That's pretty much equivalent to CV at least for CV's less than 50%:
x <- rlnorm(1000,5,.5)
sd(x)/mean(x)
[1] 0.5252718
sd(log(x))
[1] 0.5037995

Looking for a relative measure of precision _after_ taking log strikes me as very odd. If you scale your original observations by a constant factor, this will be _added_ to the log transformed data, without affecting their variation at all.
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120221/865e4722/attachment.pl>