I was wondering if there was an R equivalent to the two phased approach that MATLAB uses in performing the Kmeans algorithm. If not is there away that I can determine if the kmeans in R and the kmeans in MATLAB are essentially giving me the same clustering information within a small amount of error? -- View this message in context: http://r.789695.n4.nabble.com/K-Means-clustering-Algorithm-tp4641626.html Sent from the R help mailing list archive at Nabble.com.
K-Means clustering Algorithm
2 messages · olemissrebs1123, David L Carlson
It depends very much on what you consider "a small amount of error." Unless you specify starting centroids, K-means does not necessarily produce a unique partition for a particular data set unless you specify the starting seeds. In other words, you can get different results using Matlab's kmeans algorithm twice on the same data set (and the same for R's kmeans). One way of reducing that possibility is to use multiple starting sets of randomly chosen seeds (using nstart=10 in R kmeans or the 'replicates' option in MATLAB). In this case, kmeans runs 10 times and picks the best solution. R kmeans offers three different algorithms. By looking at the references in MATLAB's description of kmeans and R's, you should be able to figure how to match the two if that is really necessary. MATLAB has multiple options for measuring distance whereas R kmeans does not. It also has several methods for choosing starting seeds. In R you would have to use or create a function to compute those starting seeds and then pass them to kmeans. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of olemissrebs1123 Sent: Tuesday, August 28, 2012 3:16 PM To: r-help at r-project.org Subject: [R] K-Means clustering Algorithm I was wondering if there was an R equivalent to the two phased approach that MATLAB uses in performing the Kmeans algorithm. If not is there away that I can determine if the kmeans in R and the kmeans in MATLAB are essentially giving me the same clustering information within a small amount of error? -- View this message in context: http://r.789695.n4.nabble.com/K-Means- clustering-Algorithm-tp4641626.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.