Skip to content

geometric mean regression

5 messages · Poizot Emmanuel, Kjetil Halvorsen, (Ted Harding) +1 more

#
Hi,

is it possible to perform a geometric mean regression with R ?
Thanks.

------------------------------------------------
Emmanuel Poizot
Cnam/Intechmer
B.P. 324
50103 Cherbourg Cedex

Phone (Direct) : (00 33)(0)233887342
Fax : (00 33)(0)233887339
------------------------------------------------
#
Poizot Emmanuel wrote:

            
As has been said on this list before, "This is R, there is no if, only 
how",

but if you actually wanted to ask how it is possible, it would help if
you explained what is "geometric mean regression".

Kjetil
-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
               --  Mahdi Elmandjra
#
I presume the reference is to the 'geometric mean
functional regression' or the 'line of organic
correlation' or 'reduced major axis regression'.  If
so, this is relatively easy alsmost trivial to
implement in R. Maybe it's in a package, but I never
looked. I worked from Helsel's description in his
classic water resources statistics book. See Chapter
10 here: 

http://water.usgs.gov/pubs/twri/twri4a3/

Now, if you are after confidence intervals or
prediction intervals, I haven't found anything on that
yet. Seems that I did something a couple of year ago
by hacking some approximate residuals using the LOC
line and the data, and then feeding that into the CL
and PL equations for OLS. (Be advised that I'm not a
statistician and did that in the spirit of 
approximation--who knows? :O) )

By coincidence I've been looking at this again
recently. Maybe bootstrapping....

Regards,
Michael Grant

--- Kjetil Brinchmann Halvorsen <kjetil at acelerate.com>
wrote:
2 days later
#
On 03-Jun-05 Michael Grant wrote:
This somewhat contentious method is indeed trivial to
implement in R. The idea is that if you plot the two
regression lines (y on x, x on y) on the same axes
(y vertical, x horizontal), the slope of the GMR is
the geometric mean of the slopes of these two lines.

Since the slope of the y-on-x line is Sxy/Sxx, and
the slope of the x-on-y line is Syy/Sxy, the GMR slope
is therefore sqrt(Syy/Sxx) = sd(y)/sd(x).

All three lines go through the same point, (mean(x),mean(y)).
It hardly needs a package!
The method goes back a lot further than suggested here.
It seems it was proposed in oceanography by H. Sverdrup
in 1916, and very influentially promoted by W.E. Ricker
(e.g. Jnl Fisheries Research Board of Canada, 1973,
vol. 30, 409-434).
The uncertainty properties, and indeed the interpretation,
of this method are elusive. You can, of course, resort to
whatever stochastic modelling you choose (including simulation
and bootstrap) to estimate the variability of the slope
sd(y)/sd(x) and of any predictions you may want to make.

However, the method shows its indeterminate side to the
extent that the relationship between y and x is loose rather
than tight.

At one extreme, where the correlation between x and y = 1,
the two regression lines (y on x and z on y) and the GMR
all coincide. No problem here.

At the other extreme, where there is no correlation, the
GMR method still gives you a definite answer (sd(y)/sd(x))
even though by normal standards there is no relationhip
between y and x. In the latter case, the slope of the
GMR depends solely on the two SDs, and we may well ask
what is being estimated here (apart from the ratio of
the SDs).

(Of course, if you go back to the "primitive" definition, you
find yourself evaluating sqrt(0 * inf), which is indeterminate;
and this is a better outcome than sd(y)/sd(x), but still falls
short of telling you directly that y is independent of x).

As you approach the r=0 situation, you therefore have to be
mindful that the GMR method will appear to provide a definite
answer to a question which in reality has at best a vague
answer, i.e. there is a major problem of interpretation.

Therefore I would be suspicious of results obtained by "blind"
application of the GMR method which were not accompanied by
a good discussion of grounds why the results can be expectd
to be meaningful in the particular case where it has been applied.

The GMR method seems to be well entrenched in the fisheries,
natural resources, and ecology worlds. I suspect that the reasons
for this may be partly "psychological": people are aware that
they are looking for a functional relationship, are put off
(rightly) by the existence of two regression lines, and are
not enthusiastic to tangle with the difficulties (including
the potential indeterminacy) of estimating a linear functional
relationship. The GMR provides a very simple escape route
which, in no doubt many cases, may give you as good a working
answer as you can expect.

Nevertheless, I'm inclined to the view that the linear functional
relationship is usually the best way to go. When the observed
(x,y) points depart from the "true" points on the straight line
by normally distributed amounts, the MLE of the relationship
is well defined provided the ratio of the "departure" variances
is fixed. Therefore it is possible to examine the robustness
of the estimated relationship with respect to variation in the
assumed value of this ratio. To the extent that this is 
acceptably robust within plausible variation of the ratio,
you have an adequate and reliable perspective. Otherwise,
you have to acknowledge that your information is inadquate.

The danger of adopting a formulaic solution like GMY is that
it tends to conceal inadequacy of information!

Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jun-05                                       Time: 10:20:01
------------------------------ XFMail ------------------------------
#
Hi Ted,

Thank you for your informative comments regarding GMR.
   

 
TH:
Contentious...well that says a lot (seriously)!

TH:
I implemented it in a simnple brute force
manner--elegance is time--following Helsel and Hirsch.
Get the two slopes to calculate the GMR slope and then
use mean(x) and mean(y) with the new slope to get the
intercept...
 
TH:
By itself, no. Your comment is timely given another
help thread currently on the large number of packages
:O). But something like Stats-R-Us and/or the
R-grahpics gallery aimed at useful snippets not worthy
of packages....

MWG:
TH:
So it seems. After all it has to have been around to
acquire all the different names it goes by. The USGS
book is just good as an online reference. BTW read
'classic' as useful but out of print. I listed the
material because I have found it quite lucid and I
like the emphasis on non-parametric methods. Making
the material available is indeed quite generous of the
authors. I find the book quite thought provoking for
the non-statistics individual. I'm always looking for
insights.

MWG:
TH:
... 

Now you are getting to the heart of what as been
puzzling me lately. To me the question seemed to be:
does it make sense to even talk about confidence bands
and prediction bands for GMR. It seemed that one can
take a stochastic approach to prediction, i.e., one
can set up simulations and roll the dice over and
over. On one hand it is beyond my knowledge at this
time to ascertain whether or not the the results of
such effort can be couched in the traditional language
of confidence bands and prediction bands about such a
line--neither variable is (in)dependent. Yet if I view
it from the perspective of the minimization of the sum
of the areas of the right triangles (Helsel Fig. 10.8)
determined by each observation and the GMR (LOC), I am
back to a single variable(?)... Oh, well I have not
lost sleep over it, and indeed find your use of the
term 'elusive' reassuring.
Being conservative in such matters, little or no
correlation is where I declare defeat and move on to
some other tactic ;O).
Hmmm, more fodder for self study. Thank you very much
for the insights!


Best regards,
Michael Grant