Skip to content
Prev 181341 / 398502 Next

Animal Morphology: Deriving Classification Equation with

On 24-May-09 20:32:06, cdm wrote:
Many thanks for the above additional explanation, Chase. It leads to
an interpretation of the log(ID) vs log(LD) plot which could be fruitful.
Namely, the ID is a linear dimension, and the WT could be considered
as closely reflecting a (linear dimsnion)^3. If you look at the plot
of log(WT) vs log(ID):

  ## Plot log(WT) vs log(ID) (M & F)
  plot(lID,lWT)
  points(lID[ix.M],lWT[ix.M],pch="+",col="blue")
  points(lID[ix.F],lWT[ix.F],pch="+",col="red")

it is apparent that a linear increase in log(ID) as log(WT) increases
is a very good description of what is happening. Also, that the
scatter about the linear relationship is very uniform. Therefore,
a linear regression of log(ID) on log(WT) should be closely related
to the linear discrimination. First, the linear regression:

    lLM <- lm(lID ~ lWT)
    summary(lLM)$coef
  #               Estimate Std. Error   t value     Pr(>|t|)
  # (Intercept) -10.657775  0.6562166 -16.24125 5.971407e-35
  # lWT           4.901037  0.2671783  18.34369 2.899008e-40

so the slope is 4.901037, and the slope of a linear discriminant
is likely to be close to -1/4.901037 = 0.2040385. So:

    library(MASS)
    lda(SEX ~ lWG + lWT + lID)
  # [...]
  # Coefficients of linear discriminants:
  #            LD1
  # lWG   5.304967
  # lWT -11.604919
  # lID  -2.707374

so the slope of a linear discriminant (based on all 3 variables)
with respect to variation in log(WT) and log(ID) alone is
  -2.707374/11.604919 = -0.2332954
which is quite close to the above. It is also interesting to do the
discrimination using only log(WT) and log(ID):

  lda(SEX ~ lWT + lID)
  # [...]
  # Coefficients of linear discriminants:
  #            LD1
  # lWT -11.352949
  # lID  -2.673019

So *very little change* compared with using all three variables;
and the slope of this discriminant is -2.673019/11.352949 =  -0.2354471,
almost unchanged compared with the three variables.

You can see the performance of the discriminator by plotting
histograms of it (here I'll use the 2-variable one):

  ix.M <- (SEX=="M") ; ix.F <- (SEX=="F")
  LD <- 11.352949*lWT + 2.673019*lID
  hist((2.673019*lID + 11.352949*lWT)[ix.M],
        breaks=0.5*(40:80),col="blue")
  hist((2.673019*lID + 11.352949*lWT)[ix.F],
        breaks=0.5*(40:80),col="red",add=TRUE)

Inspection of this, however, raises some interesting questions
which I'd prefer to discuss with you off-list (also your queries
relating to efficacy of ID).

Ted.
[But see just one short comment below]
As pointed out in my correction, if you work with logs it looks OK
on that front! More later.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-May-09                                       Time: 22:46:12
------------------------------ XFMail ------------------------------