An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120221/da78b79c/attachment.pl>
"CV" for log normal data
6 messages · Bert Gunter, Peter Langfelder, Peter Dalgaard +1 more
On Tue, Feb 21, 2012 at 1:44 PM, array chip <arrayprofile at yahoo.com> wrote:
Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?
You may want to ask this question in the bioconductor list since it isn't really an R question. Do you also have some sort of an expression p-value? If you only have expression itself, you could simply look at variance and hope that non-expressed genes have expression values determined chiefly by noise which varies quite a bit, so they would have a higher variance than genes with stable expression higher than the typical noise. HTH, Peter
Inline below. On Tue, Feb 21, 2012 at 2:07 PM, Peter Langfelder
<peter.langfelder at gmail.com> wrote:
On Tue, Feb 21, 2012 at 1:44 PM, array chip <arrayprofile at yahoo.com> wrote:
Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?
You may want to ask this question in the bioconductor list since it isn't really an R question.
Good advice. But perhaps ?mad or some other perhaps robust plain old measure of spread? -- Bert
Do you also have some sort of an expression p-value? If you only have expression itself, you could simply look at variance and hope that non-expressed genes have expression values determined chiefly by noise which varies quite a bit, so they would have a higher variance than genes with stable expression higher than the typical noise. HTH, Peter
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Good advice. But perhaps ?mad or some other perhaps robust plain old measure of spread?
The problem is not (lack of) robustness to outliers, the problem is to find genes whose expression variation is small compared to (mean) expression. Trouble is, Agilent throws the mean expression information away, so you have to find heuristic workarounds. I have encountered the same issue before and haven't really found a good solution. Peter
On Feb 21, 2012, at 22:44 , array chip wrote:
Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?
What's wrong with the SD of log(X)?? That's pretty much equivalent to CV at least for CV's less than 50%:
x <- rlnorm(1000,5,.5) sd(x)/mean(x)
[1] 0.5252718
sd(log(x))
[1] 0.5037995 Looking for a relative measure of precision _after_ taking log strikes me as very odd. If you scale your original observations by a constant factor, this will be _added_ to the log transformed data, without affecting their variation at all.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120221/865e4722/attachment.pl>