Skip to content

Simplify formula for heterogeneity

3 messages · (Ted Harding), Stefaan Lhermitte

#
Dear R-ians,

I'm looking for a computational simplified formula to calculate a
measure for heterogeneity (let's say H ):

H = sqrt [ (Si (Sj (Xi - Xj)?? ) ) /n ]

where:
sqrt = square root
Si = summation over i  (= 0 to n)
Sj = summation over j (= 0 to n)
Xi = element of X with index i
Xj = element of X with index j

I can simplify the formula to:

H = sqrt [ ( 2 * n * Si (Xi) - 2 Si (Sj ( Xi * Xj)) ) / n]

Unfortunately this formula stays difficult in iterative programming,
because I have to keep every element of X to calculate H.

I know a computional simplified formula exists for the standard
deviation (sd) that is much easier in iterative programming.
Therefore I wondered I anybody knew about analog simplifications to
simplify H:

sd = sqrt [ ( Si (Xi - mean(X) )?? ) /n  ]  -> simplified computation ->
sqrt [ (n * Si( X?? ) - ( Si( X ) )?? )/ n?? ]

This simplied formula is much easier in iterative programming, since I
don't have to keep every element of X.
E.g.: I have a vector X[1:10]  and I already have caculated Si( X[1:10]??
) (I will call this A) and Si( X ) (I will call this B).
When X gets extendend by 1 element (eg. X[11]) it easy fairly simple to
calculate sd(X[1:11]) without having to reuse the elements of X[1:10].
I just have to calculate:

sd = sqrt [ (n * (A + X[11]??) - (A + X[11]??)?? ) / n?? ]

This is failry easy in an iterative process, since before we continue
with the next step we set:
A = (A + X[11]??)
B = (B + X[11])

Can anybody help me to do something comparable for H? Any other help to
calculate H easily in an iterative process is also welcome!

Thanx in advance!

Kind regards,
Stef
#
On 26-May-05 Stefaan Lhermitte wrote:
If I have understood your formula correctly (and you are
applying it to a vector X of length n) then it seems that
your H reduces to

  sqrt[(Si(n*(Xi - Xbar)^2) + Sj(n*(Xj - Xbar)^2))/n]

  = sqrt[2*(n-1)var(X)] = sd(X)*sqrt(2*(n-1))

(where Xbar is the mean of the values in X).

So I don't see what the special point of H is anyway.
But at least this simplifies it1

Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 26-May-05                                       Time: 17:05:13
------------------------------ XFMail ------------------------------
#
Thank you very much Ted! I have been looking at your simplification for 
more then an hour, but I don't see how you did it.
Could you perhaps, if it is not to much work, explain me how you reduced 
H? It would help me to understand what I am realy doing.

Looking at the result, it seems indeed that H does add more information 
than sd already did. Intuitively I thought the square of the sum of all 
possible differences would not be related to the standard deviation. 
Looking at your result it seems it is related by a factor sqrt(2*(n-1)) 
so there is no special point in calculating  H and I know I cannot trust 
my intuition anymore.

Thanks again!

Kind regards,
Stef