how to calculate this natural logarithm - R-help

Fri, Jan 7, 2011 8:20 AM #

Hello

	I want to calculate natural logarithm of sum of combinations as follow: (R code)

	{
		com_sum=choose(2000000,482)*choose(1000000,118)+choose(2000000,483)*choose(1000000,117)+...+choose(2000000,i)*choose(1000000,600-i)+...+choose(2000000,600)*choose(1000000,0)                  #calculate the sum
	   result=log(com_sum)                          	   #calculate the log of the sum
	}

	But every element of the com_sum is Inf, so I can't get the result
	Thank you in advance

	Yours sincerely
 
ZhaoXing
Department of Health Statistics
West China School of Public Health
Sichuan University
No.17 Section 3, South Renmin Road
Chengdu, Sichuan 610041
P.R.China
	
	

__________________________________________________
?????????????????????????????

(Ted Harding)

Fri, Jan 7, 2011 9:42 AM #

On 07-Jan-11 16:20:59, zhaoxing731 wrote:

{com_sum=choose(2000000,482)*choose(1000000,118)+
 choose(2000000,483)*choose(1000000,117)+...+
 choose(2000000,i)*choose(1000000,600-i)+...+
 choose(2000000,600)*choose(1000000,0) #calculate the sum
 result=log(com_sum) #calculate the log of the sum
}

You may find you can usefully adapt the technique I have
described in:

http://www.zen89632.zen.co.uk/R/FisherTest/extreme_fisher.pdf

A somewhat different but closely related problem.
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jan-11                                       Time: 17:42:48
------------------------------ XFMail ------------------------------

Petr Savicky

Fri, Jan 7, 2011 9:54 AM #

On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote:

The natural logarithm of choose() is lchoose(), so the
natural logarithms of the products above are

  i <- 482:600
  logx <- lchoose(2000000,i) + lchoose(1000000,600-i)
  maxlog <- max(logx)
  # [1] 5675.315

The sum of numbers, which have very different magnitudes, may
be approximated by their maximum, so max(logx) is an
approximation of the required logarithm. A more accurate
calculation can be done, for example, as follows

  maxlog + log(sum(exp(logx - maxlog)))
  # [1] 5675.977

Petr Savicky.

Petr Savicky

Sat, Jan 8, 2011 4:08 AM #

On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote:

Let me present a more detailed calculation. The natural logarithms
of the products above are

  i <- 482:600
  logx <- lchoose(2000000,i) + lchoose(1000000,600-i)

The logarithm of the largest of the products is

  maxlog <- max(logx)
  maxlog # [1] 5675.315

The ratios of the products divided by the largest of them may be computed as

  exp(logx - maxlog)
  # [1] 1.000000e+00 4.885522e-01 2.361712e-01 1.129583e-01 5.345075e-02
  # ...

Only a few of the ratios are not negligible, so the sum of the
products differs from the largest product by a small factor. This
factor may be estimated as

  sum(exp(logx - maxlog))
  # [1] 1.937370

The required logarithm differs from maxlog by the logarithm of this
factor. So, the result is 

  options(digits=15)
  maxlog + log(sum(exp(logx - maxlog)))
  # [1] 5675.97681976982

Computing the sum of products exactly using long integer arithmetic
and computing its logarithm with large precision produced

  5675.9768197698224041850302683544102271115821956713424401649...

which matches well the result obtained using double precision in R.

The above approach may be derived from the numerical part of the algorithm
suggested by Ted Harding for computing extremely small p-values in the Fisher Exact
test. There, we are computing a sum of small numbers, not large ones. However,
the problem is similar, since we cannot represent the summands themselves, but
we can represent their logarithms. In my opinion, P(a, b, c, d) is the largest
of the summands or close to the largest, so we calculate the ratios of the summands
to the largest of them or to a close approximation of the largest.

Petr Savicky.