[Bioc-devel] Illumina Methylation annotations
Hi Tim Thanks for your reply! Following is some thoughts of mine.
Thank *you* for your thoughtful replies. I'd forgotten how control probe data is stored in LumiBatch objects. That seems like the most consistent way to handle it for MethyLumiM objects, now that you mention it; using addControlData2lumi(), but with both channels represented, such as in a list with $Red and $Grn data.frames. Using getControlData with a "MethyLumiSet" type would do the trick; I can easily write this, in fact. I'll send a patch.
Considering there are $Red and $Grn channels of the control data, I may use AssayData-class instead of simple data.frame to keep the control data. Anyway, I need to see how the real control data looks like. Our Genomomic core only provides only the summary information of the control data.
As for multiple mappings, I am not sure how Illumina 450k reports them. For easier maintenance in the long run, we can just keep the same way as Illumina do. Illumina has improved their annotation maintenance. They make regular updates of their annotations now.
The most recent manifest is available via iCom, but partly because Sean had to do all the heavy lifting last time around, I'm planning to push out at least a SQLite package of probe NuID/channel/chemistry annotations as illuminaHumanMethylation450kProbes.db. In the Illumina annotations, the accession numbers are concatenated with semicolons, with as many as 6 separate accession numbers provided per probe. I don't think anyone had this scenario in mind when the AnnotationDbi package and its mappings were designed :-)
I've downloaded the manifest file of 450K Infinium chip. It does have lots of multiple mappings from probe to genes. I remember the current AnnotationDbi package can handle multiple mappings from probe to genes. But multiple mappings will make the following up analysis, like GO analysis, more challenging.
Will do. I've been working on a package so that Infinium methylation chips can be handled the way expression or SNP chips are (in 'beadarray'/'lumi' or in 'crlmm'). I plan to put up a vignette showing a mixture experiment on the 450k arrays and the 27k arrays, comparing the various preprocessing options and their effects on each platform, but if there are no objections from the investigators, I'll see if I can't just post the control probe data this week.
What package are you developing? I cannot find any similar one on Bioconductor developing website.
If you can send me some example control data, I can play with it and update the MethyLumiM class at the end of this year. If possible, please also send me one or two samples of 450K data with annotation information.
I'll see if I can't get it turned around this week. A typical 450k array is (as you might imagine) rather larger than the corresponding 27k array (~15MB vs ~1.5MB of IDAT files) but the controls are just a few K (and no mapping or normalization is required for them). So I don't imagine it would be much trouble to post some examples on GitHub.
Just send me the control data is fine if the entire data file is too big. Thanks! Pan