Skip to content
Prev 8349 / 21307 Next

[Bioc-devel] Behavior of select function in AnnotationDbi

Hi Jim,

I think we should choose the biomaRt model, that is, duplicated are
allowed but silently ignored.

Note that this is also the SQL model. When you do

   SELECT * FROM ... WHERE key IN c('key1', 'key2', ...)

duplicated keys don't generate duplicates in the output.

Also note that, like SELECT, even if the keys supplied to
biomaRt::getBM() (via the 'values' arg) don't contain duplicates
and if all the mappings are 1-to-1, biomaRt::getBM() is not
guarantee to preserve order.

Generally speaking having duplicates in the input produce duplicates
in the output is useful in vectorized operations when the output
is expected to be parallel to the input. Vectorized operations also
need to propagate NAs and to preserve order. However, like SELECT
and biomaRt::getBM(), select() cannot produce an output that is
parallel to the input *in general*.

It seems that the current philosophy for select() is to emit a note
or a warning every time the output is not parallel to the input.
Personally I find this too noisy and not that useful.

Thanks,
H.
On 11/20/2015 02:30 PM, James W. MacDonald wrote: