clustering or homegenity approaches? - R-help

Thu, Aug 11, 2005 2:36 PM #

Hi, there:
I have a question on the following dataset

[,1]      [,2]       [,3]       [,4]       [,5]
[1,] 34.216166 96.928587 330.125990 330.183222 330.201215
[2,]  2.819183  8.134491   8.275841   8.525256   8.828448
[3,]  2.819183  7.541680   7.550333   8.374636   8.690998
[4,]  4.672551  5.036353   5.072710   5.152218   5.223204
[5,]  5.470131  5.500513   5.674139   5.689151   5.770423
[6,]  4.480287  4.628300   4.797686   4.814106   4.823345

I want to filter out the first 3 cases from the rest and the criteria
is I am looking for a "gap".

My way is using std(eachrow)/median(each) and set up a threshold,
which is very naive, but fast and good enough. But I want it better
and more "academic". Please be advised. I think clustering might help,
but it needs to be quick since t2 has 30000 rows.

Thanks,

Weiwei

Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III