Skip to content

Reg : null values in kmeans

5 messages · Raji, Jannis, raji sankaran +1 more

#
Hi,

  I am using k means algorithm for clustering.My data contains a few null/NA
values.kmeans doesnt cluster with those values.Are there any option like
na.omit which can avoid these null values and cluster the remaining values?

Thanks,
Raji
1 day later
#
I do not really understand your question. You can use use kmeans but 
without the observations that include the NA values (e.g. by deleting 
whole rows in your observation matrix). If you want to keep the 
information in the valid observations of those rows, I fear you need to 
look for a clustering algorithm that can handle missing values. I doubt 
that there is a kmeans version that can. Think about inserting means of 
all other observations into the gaps, though this introduces bias as well.


Jannis

Raji schrieb:
#
Have your tried something like the following?
x1       x2        x3 cluster
1        NA 1.000000 1.0000000      NA
2 0.6931472 1.414214 0.5000000       3
3 1.0986123 1.732051        NA      NA
4 1.3862944 2.000000 0.2500000       3
5 1.6094379 2.236068 0.2000000       3
6 1.7917595 2.449490 0.1666667       3
x1       x2         x3 cluster
45 3.806662 6.708204 0.02222222       1
46 3.828641 6.782330 0.02173913       1
47 3.850148 6.855655 0.02127660       1
48 3.871201 6.928203 0.02083333       1
49 3.891820 7.000000 0.02040816       1
50 3.912023 7.071068 0.02000000       1

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Hi Raji,

I am quite sure that kmeans in general is not able to handle missing 
values so most probably there wont be an option for this in R. Either 
you omit the observations with NAs as William proposed or you search for 
some algorithm that can handle missing values (not sure whether there is 
any).  Other alternatives would be to put mean values in the NA places. 
This, however, biases the results.


HTH
Jannis

raji sankaran schrieb: