[Bioc-devel] Help understanding an R performance issue

The reason it's faster when shuffled vs. all that end is that when a
miss happens R compares the string to all strings before it in the
subscript. So it's a lot worse to have a miss towards the end.

As Martin wrote, there are basically two possible improvements that
are somewhat complementary:
1) Tell stringSubscript() that it is not replacing so there is no need
to do that scan. This would require passing an argument down the call
stack.
2) Do a self match on the subscript like in Martin's patch, although
it should probably be done lazily on the first miss.

Michael

[Bioc-devel] Help understanding an R performance issue

Thread (7 messages)