Skip to content

[Bioc-devel] lumi annotations

7 messages · Antti Honkela, Martin Morgan, Gang Feng +2 more

#
Hi all,

I am developing a package with intention to use data from several  
microarray platforms and the related annotations in a portable manner.

Given an ExpressionSet "eset", I have been using constructs like
 > library(annotate)
 > m <- getAnnMap('SYMBOL', annotation(eset))
 > s <- get(featureNames(eset)[1], m)
which seems portable and works fine on Affymetrix data I have used so  
far.

Turning to Illumina and lumi package the same does not work anymore:
------------------------------------------------------------
 > library(lumi)
 > data(example.lumi)
 > m <- getAnnMap('SYMBOL', annotation(example.lumi))
Error: getAnnMap: package lumiHumanV1 not available
 > biocLite('lumiHumanV1.db')
Using R version 2.12.0, biocinstall version 2.7.4.
Installing Bioconductor version 2.7 packages:
[1] "lumiHumanV1.db"
Please wait...

Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
   package ?lumiHumanV1.db? is not available
------------------------------------------------------------

Is this just a bug in the example.lumi object, or is it simply wrong  
to assume that the same mechanism should work with lumi at all?


Antti
#
On 12/17/2010 01:06 AM, Antti Honkela wrote:
Hi Antti --

This is the right approach; both the annotation package and 'map' have
to exist; from

http://bioconductor.org/help/bioc-views/release/data/annotation/

I think the correct annotation package name is 'illuminaHumanv1.db'.
Most maps are common across chip / organism, but for instance there are
'ORF' maps in yeast-centric packages such as yeast2.db but not elsewhere.

Martin

  
    
#
Hi, Antti

example.lumi is an object of lumi.batch. For annotation, you can use lumiHumanAll.db
to get the mapping.

Best

Gilbert

  
    
#
Hi Tim

Thanks for your long and good comments! It took me a while to finish reading
your email. :) Following are my comments.
As you know, the benefit of nuID is that we can directly know the probe
sequence without checking any table. But Illumina Probe ID, as the
manufacturer ID, is the most widely used in public. So I think one
alternative way is just to add an additional Bimap table of IlluminaID and
nuID in the current Infinium methylation library. As an option, I will add a
mapping function to convert data between Illumina ID and nuID. But by
default, data will be IlluminaID identified.

As for multiple mappings, I am not sure how Illumina 450k reports them. For
easier maintenance in the long run, we can just keep the same way as
Illumina do. Illumina has improved their annotation maintenance. They make
regular updates of their annotations now.
Can you send me some example control probe data? One option is keeping the
same way as LumiBatch-class to store control data information.
This sounds good. The probe sequences can be kept as nuID format to save
storage space. But again, long term maintenance is a issue.
Thanks for your comments of lumi methylation codebase. Lots of improvement
work still needs to be done. In the long rong, using GenomicFeatures object
definitely will be helpful to integration with other data, like NGS sequence
data, but it is not compatible with other microarray data at current stage.
So maybe it can be a long term plan for the future.
If you can send me some example control data, I can play with it and update
the MethyLumiM class at the end of this year. If possible, please also send
me one or two samples of 450K data with annotation information.

Thanks for all the comments and support!

Happy holidays!


Pan
1 day later
1 day later
#
Hi Tim

Thanks for your reply! Following is some thoughts of mine.
Considering there are $Red and $Grn channels of the control data, I may use
AssayData-class instead of simple data.frame to keep the control data.
Anyway, I need to see how the real control data looks like. Our Genomomic
core only provides only the summary information of the control data.
I've downloaded the manifest file of 450K Infinium chip. It does have lots
of multiple mappings from probe to genes. I remember the current
AnnotationDbi package can handle multiple mappings from probe to genes. But
multiple mappings will make the following up analysis, like GO analysis,
more challenging.
What package are you developing? I cannot find any similar one on
Bioconductor developing website.
Just send me the control data is fine if the entire data file is too big.

Thanks!


Pan