[Bioc-devel] Changes in AnnotationDbi

In the last release, the warning message from select() telling people that
their results include one-to-many mappings was removed. While some may find
this warning annoying, I think silently returning something unexpected to
our users is dangerous.

In other words, for me it is a common practice to do something like this:

fit <- lmFit(eset, design)
fit2 <- eBayes(fit)
gns <- select(<chippackage>, featureNames(eset), c("ENTREZID","SYMBOL"))
gns <- gns[!duplicated(gns[,1]),]
fit2$genes <- gns

I add in the step where dups are removed because I already know they are
there. But a naive user might instead do

fit2$genes <- select(<chippackage>, featureNames(eset),
c("ENTREZID","SYMBOL"))

Which will work just fine, but then all the annotation (except for the
first few lines) will now be completely incorrect, and there wasn't a
warning to let the end user know that they may have made a mistake.

lmFit() will parse the featureData slot of an ExpressionSet and use those
data for annotation, so that gives some hypothetical protections, for those
who first put their annotation data into their ExpressionSet. However,
?eSet says:

 ?featureData?: Contains variables describing features (i.e., rows
          in ?assayData?) unique to this experiment. Use the
          ?annotation? slot to efficiently reference feature data
          common to the annotation package used in the experiment.
          Class: ?AnnotatedDataFrame-class?

Which to me indicates that the featureData slot isn't really intended to
contain annotation data, but instead some unique information that pertains
to a given experiment. But maybe I misunderstand.

Is the featureData slot actually intended for annotation data? If not, what
is the intended pipeline for annotating data in an ExpressionSet? Am I
alone in being concerned about this?

Best,

Jim