Skip to content
Prev 8515 / 21312 Next

[Bioc-devel] Common workflow to build an microarray annatation package, like hgu133a.db

I should have phrased this differently:

"Don't create new .db0 packages _just to map symbols or sequences_."

The .db0 infrastructure is marvelous for oligonucleotide arrays designed to
measure transcription, but in some respects it "suffers" from the BioC
release cycle.  For example, suppose I have a bunch of hgu133plus2 and
HuGeneST 1.1 arrays where I find that the probe sequences, when aligned to
a more recent reference transcriptome than the arrays were designed
against, actually pick up noncoding RNAs better than the
(discarded-due-to-mismapping) mRNA targets they were originally designed
against.  In Du et al (2013,
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702647/) we see one of several
ways in which this information can be used.

HOWEVER!  With the new mappings of probes to genes/symbols/transcripts, we
have a bit of a conundrum, especially in situations where RNAseq data is
also available.  mapToIds() and mapToRanges() certainly helps, although a
helper function that does the same thing based on lifted transcriptomic
coordinates might do as well as the latter, and the former sometimes won't
find the correct IDs (again with the release cycle issues).  So if I map a
number of symbols to, say, Ensembl build 83 plus some other stuff (for
example, a number of recently documented non-coding RNAs), it's going to be
rough going to get things mapped back to where I want them.  And then of
course it would be nice to normalize everything in a sensible fashion.

My suggestion, due to the final two stings in the tail, would be to look
into a probedesign (pd) file for oligo, so that a person can use SCAN.UPC
to compare RNAseq and microarray quantifications of the same transcripts
across a larger number of samples.  That's just my opinion, but as may be
obvious from the above excruciating level of detail, along with several
years as maintainer of .db0 packages for platforms where the .db0
infrastructure might not have been the best fit, I do think my opinion may
help others.

Of course, I could always be wrong.  I've been wrong many times before.
Hopefully by documenting the various ways in which I've tried doing things
(right and wrong), there can be some benefit to others trying the same.

Best,



--t
On Tue, Jan 5, 2016 at 5:11 PM, James W. MacDonald <jmacdon at uw.edu> wrote:

            

  
  
Message-ID: <CAC+N9BXwe=CZKtp-xrgckwX+dEzR0-A7io-Kp0W9_y2yyZ6J=Q@mail.gmail.com>
In-Reply-To: <c63355f95e2e4a1dad767fb025bac60f@SN1PR0701MB1885.namprd07.prod.outlook.com>