Clustering large data matrix
Hi there, whether clara is a proper way of clustering depends strongly on what your data are and particularly what interpretation or use you want for your clustering. You may do better with a hierarchical method after having defined a proper distance (however this would rather go into statistical consultation and not just R help). Assuming that you use some reasonable dimension reduction and clustering method, you may get a good visualization of you clustering using the methods available via functions plotcluster/discrproj in package fpc. Best, Christian
On Thu, 6 Mar 2008, Dani Valverde wrote:
Hello, I have a large data matrix (68x13112), each row corresponding to one observation (patients) and each column corresponding to the variables (points within an NMR spectrum). I would like to carry out some kind of clustering on these data to see how many clusters are there. I have tried the function clara() from the package cluster. If I use the matrix as is, I can perform the clara analysis but when I call clusplot() I get this error: Error in princomp.default(x, scores = TRUE, cor = ncol(x) != 2) : 'princomp' can only be used with more units than variables Then, I reduce the dimensionality by using the function prcomp(). Then I take the 13 first principal components (80%< variability) and I carry out the clara() analysis again. Then, I call the clusplot() function again and voil?!, it works. The problem is that clusplot() only represents the two first components of my prcomp() analysis, which represents only 15% of the variability. So, my questions are 1) is clara() a proper way to analyze such a large data set? and 2) Is there an appropiate method for graphic plotting of my data, that takes into account the whole variability if my data, not just two principal components? Many thanks. Best, Dani -- Daniel Valverde Saub? Grup de Biologia Molecular de Llevats Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona Edifici V, Campus UAB 08193 Cerdanyola del Vall?s- SPAIN Centro de Investigaci?n Biom?dica en Red en Bioingenier?a, Biomateriales y Nanomedicina (CIBER-BBN) Grup d'Aplicacions Biom?diques de la RMN Facultat de Bioci?ncies Universitat Aut?noma de Barcelona Edifici Cs, Campus UAB 08193 Cerdanyola del Vall?s- SPAIN +34 93 5814126
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche