Very Slow Gower Similarity Function
On 18 Apr 2005, at 19:10, Tyler Smith wrote:
Hello, I am a relatively new user of R. I have written a basic function to calculate the Gower similarity function. I was motivated to do so partly as an excercise in learning R, and partly because the existing option (vegdist in the vegan package) does not accept missing values.
Speed is the reason to use C instead of R. It should be easy, almost trivial, to modify the vegdist.c so that it handles missing values. I guess this handling means ignoring the value pair if one of the values is missing -- which is not so gentle to the metric properties so dear to Gower. Package vegan is designed for ecological community data which generally do not have missing values (except in environmental data), but contributions are welcome.
I think I have succeeded - my function gives me the correct values. However, now that I'm starting to use it with real data, I realise it's very slow. It takes more than 45 minutes on my Windows 98 machine (R 2.0.1 Patched (2005-03-29)) with a 185x32 matrix with ca 100 missing values. If anyone can suggest ways to speed up my function I would appreciate it. I suspect having a pair of nested for loops is the problem, but I couldn't figure out how to get rid of them.
cheers, jari oksanen -- Jari Oksanen, Oulu, Finland