Multidimensional scaling and distance matrices
A few comments: MDS is normally done on a dissimilarity matrix, not necessarily a distance matrix (no need for the triangle inequality to be enforced). Some MDS software will autmatically map similarity matrices to corresponding dissimilarity matrices if told to do so (but not all by the same mapping, usually D = 1-S or D = sqrt(1-S)). It looks like a `kinship' matrix is a cousin of a similarity matrix, which usually have entries between 0 and 1 and with 1 on the diagonal. The description of MDS in Statistica at http://www.statsoftinc.com/textbook/stmulsca.html is entirely in terms of `observed distances', and Kruskal-type MDS. Note that non-metric MDS is almost impossible to reproduce due to local minima, although hopefully one could get a similar solution in a different implementation of the same method. Faced with your example, I would treat it as a covariance matrix, turn it into a correlation matrix and take the distances as 1 - correlations, and cross my fingers.
On 26 Feb 2004, Federico Calboli wrote:
Dear All, I am in the somewhat unfortunate position of having to reproduce the results previously obtained from (non-metric?) MDS on a "kinship" matrix using Statistica. A kinship matrix measures affinity between groups, and has its maximum values on the diagonal. Apparently, starting with a nxn kinship matrix, all it was needed to do was to feed it to Statistica flagging that the matrix was NOT a distance matrix but a kinship one. If Statistica transformed the kinship matrix into a distance one (how?) is anybody's guess. A quick search immediately showed that a multidimensional scaling is done on a distance matrix. See for instance: MASS4, pg 304 "Elements of computational statistics", Jentle, pg 122 Edwards and Oman's article, page 2-7 R-News 3/3 The fact that Statistica happily perform MDS on a "kinship" matrix is puzzling. Indeed, I would expect errors, as in the following toy example, without transforming the kinship matrix to distances:
test
V1 V2 V3 V4 V5 1 0.198716340 0.003612042 0.011926851 0.019737349 0.015021053 2 0.003612042 0.066742885 0.013809924 0.005121996 0.011175845 3 0.011926851 0.013809924 0.197337389 0.013893087 0.006405424 4 0.019737349 0.005121996 0.013893087 0.216047450 0.006218477 5 0.015021053 0.011175845 0.006405424 0.006218477 0.118812936 cmdscale(test) [,1] [,2] V1 NaN NaN V2 NaN NaN V3 NaN NaN V4 NaN NaN V5 NaN NaN Warning messages: 1: some of the first 2 eigenvalues are < 0 in: cmdscale(test) 2: NaNs produced in: sqrt(ev)
isoMDS(test)
Error in isoMDS(test) : NAs/Infs not allowed in d
sammon(test)
Error in sammon(test) : initial configuration must be complete In addition: Warning messages: 1: some of the first 2 eigenvalues are < 0 in: cmdscale(d, k) 2: NaNs produced in: sqrt(ev) The colleagues who used the above routine are unable to tell me with certainty whether Statistica used metric/non metric scaling, and if non metric whether a Kruskall or a Sammon scaling. In any case, I would simply like to ask the memebers of the list if I am correct in thinking that MDS can ONLY be performed on a distance matrix, and I can therefore reasonably expect that some form of transformation to a distance matrix has been performed by Statistica prior to the MDS. It would at least be a first step to understand what exactly Statistica did with the data. Regards, Federico Calboli
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595