keep the centre fixed in K-means clustering
So you just want to compare the distances from each point of your new data to each of the Centres and assign the corresponding number of the centre as in: clust <- apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2)))) but since the apply loop is rather long here for lots of new data, one may want to optimize the runtime for huge data and get: tNewData <- t(NewData) clust <- max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2))) Best, Uwe Ligges
On 21.05.2013 13:19, HJ YAN wrote:
Dear R users
I have the matrix of the centres of some clusters, e.g. 20 clusters each
with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
values.
I have collected new data (each with 100 numeric values) and would like to
keep the above 20 centres fixed/'unmoved' whilst just see how my new data
fit in this grouping system, e.g. if the data is close to cluster 1 than
lable it 'cluster 1'.
If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my
new data 'NewData' has 500 observations, by using kmeans() will update the
centres:
kmeans(NewData, Centre)
I wondered if there is other R packages out there can keep the centres
fixed and lable each observations of my new data? Or I have to write my own
function?
To illustrate my task using a simpler example:
I have
Centre<- matrix(c(0,1,0,1), nrow=2)
# the two created centres in a two dimentional case are
Centre
[,1] [,2]
[1,] 0 0
[2,] 1 1
NewData<-rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
NewData1<-cbind(c1:100), NewData)
colnames(NewData1)<-c("ID","x","y")
# my data
head(NewData1)
ID x y
[1,] 1 -0.3974660 0.1541685
[2,] 2 0.5321347 0.2497867
[3,] 3 0.2550276 0.1691720
[4,] 4 -0.1162162 0.6754874
[5,] 5 0.1570996 0.1175119
[6,] 6 0.4816195 -0.6836226
## I'd like to have outcome as below (whilst keep the tow centers fixed):
ID x y Cluster
[1,] 1 -0.3974660 0.1541685 1
[2,] 2 0.5321347 0.2497867 1
[3,] 3 0.2550276 0.1691720 1
[4,] 4 -0.1162162 0.6754874 1
...
[55,] 55 1.1570996 1.1175119 2
[56,] 56 1.4816195 1.6836226 2
p.s. I use Euclidian to obtain/calculate distance matrix.
Many thanks in advance
HJ
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.