Skip to content

Normality Test on several groups

7 messages · Andy Jacobson, Peter Dalgaard, Giampiero Salvi +4 more

#
Hi Knut,

 Knut> Unfortunately, a non-significant test is merely
 Knut> non-conclusive (Popper KR, 1979), so one would have to test for
 Knut> equivalence, e.g., as TOST (two one-sided tests).

 Knut> As to whether you can do a Lilliefors test for several groups,
 Knut> that depends entirely on your ability to understand what the
 Knut> underlying question would be (see Adams D 1979)

 Could you please provide more information on the Popper and Adams
 references you cite above?  While I'm fairly certain that Popper
 1979 is:

 Popper, Karl. Objective knowledge: an evolutionary approach. Oxford:
 Oxford University Press; 1979,

 I've had a bit of trouble searching for the Adams reference.  It
 appears that Douglas Adams published "The Hitchhiker's Guide to the
 Galaxy" in 1979, and as a result of that work's popularity there's
 almost no hope of using a general search engine to find a different
 "Adams D 1979".

 On the other hand, the context of the citation:

 Knut> that depends entirely on your ability to understand what the
 Knut> underlying question would be (see Adams D 1979)

 leads me to suspect that you intended to cite The Hitchhikker's
 Guide.  

 Cheers,

        Andy
#
Andy Jacobson <andyj at splash.princeton.edu> writes:
...in which case the answer is forty-two, surely.
#
Hi all,
I'm doing a study on predicting the "true" number of clusters in
a hierarchical clustering scheme. My main reference is at the moment

Milligan GW and Cooper MC (1985) "An examination of procedures for
determining the number of clusters in a data set"
Psychometrika vol 50 no 2 pp 159-179

and all the references included in that paper.

I'm planning to perform a similar comparison on a number of indexes,
but on a much larger data set (in the order of 3000 points), and with
a much higher "true" number of clusters (in the order of some hundreds),
to see if the properties of the indexes scale accordingly.

I was wondering if the set of indexes described in the reference are
still "state of the art" (most of them were introduced in the '60s
and '70s), or if there are new indexes and methods I could include in
my study. I would really appreciate if you could point me to some newer
references addressing this problem.

I also read Milligan's chapter in the book "Clustering and
Classification" from 1995, but didn't find information on this subject
that wasn't included in the previous paper.

Thank you very much,
Giampiero

_________________________________________________________
Giampiero Salvi, M.Sc.          www.speech.kth.se/~giampi
Speech, Music and Hearing       Tel:      +46-8-790 75 62
Royal Institute of Technology   Fax:      +46-8-790 78 54
Drottning Kristinasv. 31,  SE-100 44,  Stockholm,  Sweden
#
Andy and Peter: Of yours, both of you are right.

Re h2g2 (Adams DN 1979):
Re Sir Karl, I have to admit a typo (1+6 instead of 6-1, see 
en.wikipedia.org/wiki/The_Answer_to_Life,_the_Universe,_and_Everything for 
related algebraic problems) and I should have quoted the original publication:
Cheers, Knut
At 12:38 2004-02-07 -0500, Andy wrote:

        
At 20:54 2004-02-07 +0100, Peter wrote:

            
Knut M. Wittkowski, PhD,DSc
------------------------------------------
The Rockefeller University, GCRC
Experimental Design and Biostatistics
1230 York Ave #121B, Box 322, NY,NY 10021
+1(212)327-7175, +1(212)327-8450 (Fax)
kmw at rockefeller.edu
http://www.rucares.org/clinicalresearch/dept/biometry/
12 days later
#
Back from my vacation, I haven't seen an R-help answer on this
  (Christian, where have you been ? ;-)
GiampS> Hi all, I'm doing a study on predicting the "true"
    GiampS> number of clusters in a hierarchical clustering
    GiampS> scheme. My main reference is at the moment

    GiampS> Milligan GW and Cooper MC (1985) "An examination of
    GiampS> procedures for determining the number of clusters in
    GiampS> a data set" Psychometrika vol 50 no 2 pp 159-179

    GiampS> and all the references included in that paper.

(not available to me)

    GiampS> I'm planning to perform a similar comparison on a
    GiampS> number of indexes, but on a much larger data set (in
    GiampS> the order of 3000 points), and with a much higher
    GiampS> "true" number of clusters (in the order of some
    GiampS> hundreds), to see if the properties of the indexes
    GiampS> scale accordingly.

    GiampS> I was wondering if the set of indexes described in
    GiampS> the reference are still "state of the art" (most of
    GiampS> them were introduced in the '60s and '70s), or if
    GiampS> there are new indexes and methods I could include in
    GiampS> my study. I would really appreciate if you could
    GiampS> point me to some newer references addressing this problem.

Gordon's 2nd edition,

  author =	 {A. D. Gordon},
  title = 	 {Classification, 2nd Edition},
  publisher = 	 {Chappman \& Hall/CRC},
  year = 	 1999,
  series =	 {Monographs on Statistics and Applied Probability 82},
  edition =	 {2nd edition}

has a whole chapter (one of the last ones in the book) on this.

R's cluster package has a generic silhouette() function (with 2 methods),
and plot.silhouette() method --- all are improvements from
Kaufman & Rousseeuw's original code.

A recent research paper using "CLEST" (Fridyland & Dudoit),
mentioning "GAP" (Tibshirani) etc etc  still find silhouette
among the best "indices" for determining the number of clusters.

A student's (master) thesis here seems to point in the same
direction.

    GiampS> I also read Milligan's chapter in the book
    GiampS> "Clustering and Classification" from 1995, 
(which book? author?)

    GiampS> but didn't find information on this subject that wasn't
    GiampS> included in the previous paper.

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
1 day later
#
I don't really believe that there is any satisfactory definition of the 
"true number of clusters" let along a procedure that would reliably find it.

Murray Jorgensen
Martin Maechler wrote:

            

  
    
#
Hi,
(Uh, I missed this one. Too much spam?)

I would add information based criteria (AIC, BIC and so on)
together with a normal mixture model (implemented in package mclust). Four
of these criteria are compared in Celeux and Soromenho, An Entropy
Criterion for Assessing the Number of Clusters in a Mixture Model, Journal
of Classification 13, 195-212 (1996) along with more references.

Note that there are also a number of clustering approaches in the recent
literature that decide about the number of clusters implicitly (not via
optimizing over all cluster numbers), e.g., DBSCAN. 

Christian
***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de