Skip to content
Back to formatted view

Raw Message

Message-ID: <47CFCCFE.6070501@uab.cat>
Date: 2008-03-06T10:52:46Z
From: Dani Valverde
Subject: Clustering large data matrix

Hello,
I have a large data matrix (68x13112), each row corresponding to one 
observation (patients) and each column corresponding to the variables 
(points within an NMR spectrum). I would like to carry out some kind of 
clustering on these data to see how many clusters are there. I have 
tried the function clara() from the package cluster. If I use the matrix 
as is, I can perform the clara analysis but when I call clusplot() I get 
this error:

Error in princomp.default(x, scores = TRUE, cor = ncol(x) != 2) :   
'princomp' can only be used with more units than variables

Then, I reduce the dimensionality by using the function prcomp(). Then I 
take the 13 first principal components (80%< variability) and I carry 
out the clara() analysis again. Then, I call the clusplot() function 
again and voil?!, it works. The problem is that clusplot() only 
represents the two first components of my prcomp() analysis, which 
represents only 15% of the variability.
So, my questions are 1) is clara() a proper way to analyze such a large 
data set? and 2) Is there an appropiate method for graphic plotting of 
my data, that takes into account the whole variability if my data, not 
just two principal components?
Many thanks.
Best,

Dani

-- 
Daniel Valverde Saub?

Grup de Biologia Molecular de Llevats
Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona
Edifici V, Campus UAB
08193 Cerdanyola del Vall?s- SPAIN

Centro de Investigaci?n Biom?dica en Red
en Bioingenier?a, Biomateriales y
Nanomedicina (CIBER-BBN)

Grup d'Aplicacions Biom?diques de la RMN
Facultat de Bioci?ncies
Universitat Aut?noma de Barcelona
Edifici Cs, Campus UAB
08193 Cerdanyola del Vall?s- SPAIN
+34 93 5814126