Skip to content

Multi-GPU "Yinyang" K-means and K-nn for R

2 messages · Vadim Markovtsev, Charles Determan

#
?Hola!

This is to announce that [kmcuda](https://github.com/src-d/kmcuda) has
obtained native R bindings and ask for the help with CRAN packaging.
kmcuda is my child: an efficient GPGPU (CUDA) library to do K-means
and K-nn on as much data as fits into memory. It supports running on
multiple GPUs simultaneously, angular distance metric, Yinyang
refinement, float16 (well, not in R for sure), K-means++ and AFK-MC2
initialization. I am thinking about Minibatch in the near future.

Usage example:

    dyn.load("libKMCUDA.so")
    samples <- replicate(4, runif(16000))
    result = .External("kmeans_cuda", samples, 50, tolerance=0.01,
                                 seed=777, verbosity=1)
    print(result$centroids)
    print(result$assignments[1:10,])

This library only supports Linux and macOS at the moment. Windows
port is welcome.

I knew pretty much nothing about R a week ago so would be glad to your
suggestions. Besides, I've never published anything to CRAN and it
will take some time for me to design a full package following the
guidelines and rules. It will be awesome If somebody is willing to
help! It seems to be the special fun to package the CUDA+OpenMP
code for R and this fun doubles on macOS where you need a specific
combination of two different clang compilers to make it work.

Besides, I have a question which prevents me from sleeping at night:
how is R able to support matrices with dimensions larger than
INT32_MAX if the only integer type in C API is int (32-bit signed on
Linux)? Even getting the dimensions with INTEGER() automatically leads
to the overflow.
--
Best regards,

Vadim Markovtsev
Lead Machine Learning Engineer || source{d} / sourced.tech / Madrid
StackOverflow: 69708/markhor | GitHub: vmarkovtsev | data.world: vmarkovtsev
#
Hi Vadim,

I would be happy to explore helping you out with this.  I am quite active
in development for GPU use in R.  You can see my work on my github (
https://github.com/cdeterman) and the group I created for additional
packages in development (https://github.com/gpuRcore).  I believe it would
be best though to take this conversation off list though.  If you would
like to discuss this further please email me separately.

Kind regards,
Charles


On Thu, Feb 23, 2017 at 4:37 AM, Vadim Markovtsev <vadim at sourced.tech>
wrote: