Skip to content

AW: AW: [R] constructing specially ordered factor

2 messages · Khamenia, Valery, Thomas Lumley

#
1. my last homework in university was done a lot of years ago.

2. I always try to follow posting guide.
Here stated when element is removed. There is no explicit statement,
that the order is preserved. If one writes its own implementation
with reodered output, it still matches the docs. Or?
BTW, do you mean that current hash-based implementation brings *clearly* 
better performance than any O(n*log(n)) sort based algorithm? 
If I have correctly understood src/main/unique.c then current 
hash function is niether minimal perfect hash function nor even 
minimal hash function. In addition, as might be expected, 
current hash function uses full pass through the string to get 
a hash key. 

So, in particular, can anyone clearly show that the current 
hash-based algorithm will be quicker then sort-based 
algorithm if the input has:

1. a lot of strings;
2. strings are very long; 
3. strings are quite unsimilar

?

Hm, I don't believe you are ready to prove smth. like that. 

P.S. Sorry for broken email-reference. I am bound to Outlook 
Express now.

--
Valery
#
On Tue, 5 Oct 2004, Khamenia, Valery wrote:
Clearly the algorithm is not optimal for all possible sets of 
inputs (eg if the inputs are already known to be unique then an even 
faster implementation is just to do nothing).

As R's string comparison function use C's strcmp, for which the C standard 
makes no performance guarantees whatsoever, it is not possible to prove 
*anything* about the relative performance of algorithms without making 
some additional assumptions.

However, if this was a serious question, it is known that
a) in some versions of R on some byte-orderings the hash was broken and 
the performance was far inferior, suggesting that it is ordinarily quite 
effective.
b) Switching the rowsum function from a sort-based implementation to 
hashing produced a substantial speed increase.

 	-thomas