millions of comparisons, speed wanted
Dear Andy,
On Thursday 15 December 2005 20:57, Liaw, Andy wrote:
Just some untested idea: If the data are all 0/1, you could use dist(input, method="manhattan"), and then check which entry equals 1. This should be much faster than creating all pairs of rows and check position-by-position.
Thanks for the idea, I played a little with it. At the beginning yes, the data are all 0/1, but during the minimizing iterations there are also "x" values; for example comparing: 0 1 0 1 1 0 0 0 1 1 should return 0 "x" 0 1 1 whereas 0 "x" 0 1 1 0 0 0 1 1 shouldn't even be compared (they have different number of figures). Replacing "x" with NA in dist is not yielding results either, as with NA 0 0 1 1 0 0 0 1 1 dist returns 0. I even wanted to see if I could tweak the dist code, but it calls a C program and I gave up. Nice idea anyhow, maybe I'll find a way to use it further. Best, Adrian
Adrian DUSA
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
+40 21 3120210 / int.101