Multi-GPU "Yinyang" K-means and K-nn for R

?Hola!

This is to announce that [kmcuda](https://github.com/src-d/kmcuda) has
obtained native R bindings and ask for the help with CRAN packaging.
kmcuda is my child: an efficient GPGPU (CUDA) library to do K-means
and K-nn on as much data as fits into memory. It supports running on
multiple GPUs simultaneously, angular distance metric, Yinyang
refinement, float16 (well, not in R for sure), K-means++ and AFK-MC2
initialization. I am thinking about Minibatch in the near future.

Usage example:

    dyn.load("libKMCUDA.so")
    samples <- replicate(4, runif(16000))
    result = .External("kmeans_cuda", samples, 50, tolerance=0.01,
                                 seed=777, verbosity=1)
    print(result$centroids)
    print(result$assignments[1:10,])

This library only supports Linux and macOS at the moment. Windows
port is welcome.

I knew pretty much nothing about R a week ago so would be glad to your
suggestions. Besides, I've never published anything to CRAN and it
will take some time for me to design a full package following the
guidelines and rules. It will be awesome If somebody is willing to
help! It seems to be the special fun to package the CUDA+OpenMP
code for R and this fun doubles on macOS where you need a specific
combination of two different clang compilers to make it work.

Besides, I have a question which prevents me from sleeping at night:
how is R able to support matrices with dimensions larger than
INT32_MAX if the only integer type in C API is int (32-bit signed on
Linux)? Even getting the dimensions with INTEGER() automatically leads
to the overflow.
--
Best regards,

Vadim Markovtsev
Lead Machine Learning Engineer || source{d} / sourced.tech / Madrid
StackOverflow: 69708/markhor | GitHub: vmarkovtsev | data.world: vmarkovtsev