Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

Sure,

I was attempting to be concise and boiling it down to what I saw as the root
issue, but you are right, I could have taken it a step further. So here goes.

I have a set of around around 20M string pairs. A given string (say, A) can
either be equivalent to another string (B) or not. If A and B occur together in
the same pair, they are equivalent. But equivalence is transitive, so if A and B
occur together in one pair, and A and C occur together in another pair, then A
and C are also equivalent. I need a way to quickly determine if any two strings
from my data set are equivalent or not.

Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

Thread (9 messages)