Help with Clustering Techniques in R - R-help

Mon, Aug 27, 2001 8:24 PM #

Greetings RListers,

I have a data set containing two types of outcomes; success and failure.
Associated with each outcome are 12  different measurements.  I'm trying
to find out, for example, if some of the 12 measures are associated more
with success or failure or, if there's any relationship at all between
the measures and the outcomes (success or failure).

I don't have (as yet) any experience using clustering techniqes (in R or
elsewhere) but thought that they might be applied in this situation.

In general, are clustering techniques useful in this situation? Is any
techniques better than another?

And more specifically, can I use R to determine clustering characterics
of the 12 measures and then, for each cluster, have R output whether the
cluster is more associated with success or failure?

Any help appreciated,

Bill Vedder



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Christian Hennig

Tue, Aug 28, 2001 3:09 AM #

Dear Bill,

I think that this is essentially a problem of discriminant analysis or
logistic regression and not of clustering. That is, you have two groups
defined by success and failure, and you want to find out how your measures
are able to distiguish between them.

R should be able to carry out such analyses. I don't have the time now
to figure out exactly how. Background and further references are given e.g. in
G.J. McLachlan, Discriminant analysis and statistical pattern recognition,
Wiley, 1992.

Best,
Christian

On Mon, 27 Aug 2001, Bill Vedder wrote:

***********************************************************************
Christian Hennig
University of Hamburg, Faculty of Mathematics - SPST/ZMS
 (Schwerpunkt Mathematische Statistik und Stochastische Prozesse,
 Zentrum fuer Modellierung und Simulation)
Bundesstrasse 55, D-20146 Hamburg, Germany
Tel: x40/42838 4907, privat x40/631 62 79
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag.de


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Tue, Aug 28, 2001 3:39 AM #

On Tue, 28 Aug 2001, Christian Hennig wrote:

In pattern-recognition terminology, this is a supervised and not
an unsupervised problem.

That's rather old-fashioned (and was in 1992).  Probably the best
exploratory methods are logistic discrimination and classifcation trees,
methods statisticians used to consistently overlook.

R is very well set up for this sort of thing.  There are several
worked examples in Venables & Ripley (1999) that work with minor
changes (see the online R complements) in R.

Best,
Christian

On Mon, 27 Aug 2001, Bill Vedder wrote:

Greetings RListers,

I have a data set containing two types of outcomes; success and failure.
Associated with each outcome are 12  different measurements.  I'm trying
to find out, for example, if some of the 12 measures are associated more
with success or failure or, if there's any relationship at all between
the measures and the outcomes (success or failure).

I don't have (as yet) any experience using clustering techniqes (in R or
elsewhere) but thought that they might be applied in this situation.

In general, are clustering techniques useful in this situation? Is any
techniques better than another?

And more specifically, can I use R to determine clustering characterics
of the 12 measures and then, for each cluster, have R output whether the
cluster is more associated with success or failure?

Any help appreciated,

Bill Vedder



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

***********************************************************************
Christian Hennig
University of Hamburg, Faculty of Mathematics - SPST/ZMS
 (Schwerpunkt Mathematische Statistik und Stochastische Prozesse,
 Zentrum fuer Modellierung und Simulation)
Bundesstrasse 55, D-20146 Hamburg, Germany
Tel: x40/42838 4907, privat x40/631 62 79
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag.de


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._