Skip to content

Animal Morphology: Deriving Classification Equation with Linear Discriminat Analysis (lda)

6 messages · cdm, (Ted Harding)

cdm
#
Fellow R Users:

I'm not extremely familiar with lda or R programming, but a recent editorial
review of a manuscript submission has prompted a crash cousre. I am on this
forum hoping I could solicit some much needed advice for deriving a
classification equation.

I have used three basic measurements in lda to predict two groups: male and
female. I have a working model, low Wilk's lambda, graphs, coefficients,
eigenvalues, etc. (see below). I adjusted the sample analysis for Fisher's
or Anderson's Iris data provided in the MASS library for my own data.

My final and last step is simply form the classification equation. The
classification equation is simply using standardized coefficients to
classify each group- in this case male or female. A more thorough
explanation is provided:

"For cases with an equal sample size for each group the classification
function coefficient (Cj) is expressed by the following equation:

Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp

where Cj is the score for the jth group, j = 1 ? k, cjo is the constant for
the jth group, and x = raw scores of each predictor. If W = within-group
variance-covariance matrix, and M = column matrix of means for group j, then
the constant   cjo= (-1/2)CjMj" (Julia Barfield, John Poulsen, and Aaron
French 
http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm).

I am unable to navigate this last step based on the R output I have. I only
have the linear discriminant coefficients for each predictor that would be
needed to complete this equation.

Please, if anybody is familiar or able to to help please let me know. There
is a spot in the acknowledgments for you.

All the best,

Chase Mendenhall


Below is the R Commander Windows
http://www.nabble.com/file/p23693355/LDA-WRMA.csv LDA-WRMA.csv :

Script Window:

#Dataset
workWRMA = read.csv("C:\\Users\\Chase\\Documents\\Interpubic
Distance\\LDA\\LDA-WRMA.csv")
workWRMA

#Linear Discriminant Function
model<-lda(WRMA_SEX~WRMA_WG+WRMA_WT+WRMA_ID,data=workWRMA)
model
plot(model)
predict(model)

#Wilk's Lambda FYI:(Sqrt(1- Wilks? lamda)=canonical correlation)
X<-as.matrix(workWRMA[-4])
Y<-workWRMA$WRMA_SEX
workWRMA.manova<-manova(X~Y)
workWRMA.wilks<-summary(workWRMA.manova,test="Wilks")
workWRMA.wilks

#Group Centroids
sum(LD1*(workWRMA$WRMA_SEX=="F"))/sum(workWRMA$WRMA_SEX=="F")
sum(LD1*(workWRMA$WRMA_SEX=="M"))/sum(workWRMA$WRMA_SEX=="M")

#Eigenvalue/Canonical Correlation
model$svd

Output Window:
WRMA_WG   WRMA_WT   WRMA_ID WRMA_SEX
1   57.50000 11.980000  5.300000        F
2   60.50000 12.250000  8.100000        F
3   59.10000 13.280000  6.650000        F
4   61.00000 12.200000  7.300000        F
5   59.42857 13.042857  7.700000        F
6   59.20000 13.340000 10.200000        F
7   60.20000 12.600000  5.000000        F
8   61.00000 13.250000  8.000000        F
9   59.66667 12.160000  5.300000        F
10  59.00000 12.425000  8.000000        F
11  59.71429 12.333333  6.000000        F
12  60.16667 12.500000  4.400000        F
13  60.20000 13.700000  7.600000        F
14  61.00000 12.100000  6.900000        F
15  57.88889 12.100000  6.550000        F
16  58.40000 12.740000  6.000000        F
17  60.50000 12.900000  7.500000        F
18  60.00000 13.850000  9.400000        F
19  56.50000 12.600000  5.600000        F
20  59.00000 11.700000  6.500000        F
21  60.00000 12.800000  9.100000        F
22  59.00000 12.000000  6.300000        F
23  56.00000 11.900000  4.000000        F
24  60.00000 11.800000  6.200000        F
25  60.00000 13.150000  3.800000        F
26  61.00000 12.300000  6.100000        F
27  61.25000 12.050000  4.000000        F
28  57.00000 11.950000  5.700000        F
29  59.00000 12.700000  5.200000        F
30  58.50000 13.350000  4.050000        F
31  54.00000 12.000000  5.100000        F
32  60.00000 12.000000  4.100000        F
33  58.00000 12.300000  6.500000        F
34  57.00000 12.600000  4.600000        F
35  57.00000 11.500000  6.300000        F
36  60.00000 12.200000  6.700000        F
37  60.60000 12.525000  8.766667        F
38  58.33333 13.533333 11.500000        F
39  59.25000 12.575000  5.700000        F
40  61.12500 12.733333  7.000000        F
41  60.00000 12.200000  5.900000        F
42  60.00000 11.850000  4.900000        F
43  59.25000 12.600000  8.100000        F
44  60.00000 12.300000  7.900000        F
45  61.00000 12.400000  4.500000        F
46  57.00000 12.600000  5.800000        F
47  59.00000 12.900000  6.400000        F
48  57.00000 12.000000  7.150000        F
49  55.00000 12.700000  5.200000        F
50  61.00000 12.200000  5.700000        F
51  60.00000 11.700000  7.000000        F
52  59.00000 14.400000  5.000000        F
53  60.50000 13.000000  6.300000        F
54  60.00000 12.800000  8.000000        F
55  56.50000 12.100000  6.300000        F
56  56.33333 12.800000  3.450000        F
57  59.00000 11.850000  4.000000        F
58  57.00000 12.000000  4.200000        F
59  61.50000 12.700000  5.200000        F
60  58.00000 12.600000  4.000000        F
61  57.00000 12.800000  6.800000        F
62  56.00000 11.900000  4.100000        F
63  61.00000 14.100000  7.000000        F
64  59.00000 12.300000  5.900000        F
65  60.00000 13.300000  8.100000        F
66  59.50000 12.750000  6.400000        F
67  62.00000 11.700000  6.000000        F
68  60.00000 12.600000  6.500000        F
69  60.00000 13.100000  9.200000        F
70  58.12500 12.462500  9.000000        F
71  60.16667 12.650000 10.250000        F
72  59.50000 12.866667  7.300000        F
73  58.25000 12.675000  8.800000        F
74  60.00000 12.220000  8.100000        F
75  58.77778 12.955556  4.700000        F
76  59.00000 12.800000  9.600000        F
77  60.60000 12.257143  7.500000        F
78  59.62500 13.428571  5.700000        F
79  60.00000 13.025000  6.000000        F
80  57.60000 12.025000  5.450000        F
81  60.25000 12.675000  6.100000        F
82  58.66667 12.781818  7.000000        F
83  59.42857 13.242857  8.900000        F
84  58.40000 12.980000  6.800000        F
85  60.50000 12.600000  6.200000        F
86  60.00000 10.350000  3.050000        M
87  59.00000 11.100000  1.900000        M
88  58.50000 10.083333  1.600000        M
89  60.50000 10.700000  3.000000        M
90  58.50000 11.400000  2.200000        M
91  58.60000 10.320000  2.500000        M
92  61.75000 10.580000  1.400000        M
93  60.80000 11.300000  2.800000        M
94  59.33333 10.150000  2.200000        M
95  59.60000 10.066667  2.100000        M
96  58.60000 10.466667  1.800000        M
97  59.33333 11.300000  3.000000        M
98  58.50000 11.400000  2.000000        M
99  60.20000 10.416667  1.800000        M
100 58.66667 10.050000  2.200000        M
101 62.00000 10.980000  2.400000        M
102 61.00000 11.550000  2.150000        M
103 55.00000 10.433333  2.200000        M
104 57.00000 10.450000  2.500000        M
105 59.25000 10.800000  2.800000        M
106 59.00000 10.500000  1.300000        M
107 60.00000 10.300000  2.000000        M
108 57.00000 11.450000  3.250000        M
109 56.00000  9.900000  2.400000        M
110 57.00000 10.900000  2.600000        M
111 59.83333 10.500000  2.300000        M
112 59.25000 11.075000  3.600000        M
113 57.50000 10.725000  2.300000        M
114 60.33333 10.766667  2.400000        M
115 58.00000 10.800000  2.250000        M
116 60.50000 10.850000  2.700000        M
117 60.00000 11.900000  2.700000        M
118 57.00000 10.000000  2.500000        M
119 58.00000 11.000000  1.800000        M
120 61.00000 10.500000  2.400000        M
121 59.33333  8.700000  1.200000        M
122 60.20000 10.625000  2.400000        M
123 55.00000 10.000000  2.300000        M
124 58.50000  9.800000  1.800000        M
125 59.00000  9.866667  2.050000        M
126 58.00000 10.300000  2.000000        M
127 58.50000  9.475000  1.950000        M
128 58.00000  9.750000  2.500000        M
129 58.66667 10.100000  2.000000        M
130 61.00000 10.560000  3.700000        M
131 59.00000 10.050000  1.600000        M
132 56.00000 10.400000  3.000000        M
133 59.00000 10.550000  1.600000        M
134 60.00000 10.600000  2.000000        M
135 57.00000 10.400000  2.700000        M
136 58.00000 10.500000  2.700000        M
137 57.00000 10.300000  3.400000        M
138 61.00000 11.000000  2.200000        M
139 57.00000 11.100000  2.200000        M
140 57.00000 10.600000  2.300000        M
141 59.00000 10.800000  2.700000        M
142 58.25000 10.800000  2.700000        M
143 59.50000 10.700000  1.800000        M
144 58.00000 11.000000  1.200000        M
145 58.00000 11.800000  2.500000        M
146 58.00000 10.766667  2.900000        M
147 58.40000 10.360000  2.700000        M
148 57.25000 12.275000  1.300000        M
149 59.66667 10.933333  1.500000        M
150 59.66667 10.085714  2.400000        M
151 57.08333 10.040000  1.200000        M
152 60.00000 10.480000  2.300000        M
153 59.66667 10.250000  1.500000        M
Call:
lda(WRMA_SEX ~ WRMA_WG + WRMA_WT + WRMA_ID, data = workWRMA)

Prior probabilities of groups:
        F         M 
0.5555556 0.4444444 

Group means:
   WRMA_WG  WRMA_WT  WRMA_ID
F 59.14702 12.57545 6.483725
M 58.76814 10.58869 2.270588

Coefficients of linear discriminants:
               LD1
WRMA_WG  0.1013304
WRMA_WT -1.1900916
WRMA_ID -0.4610512
$class
  [1] F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F
 [38] F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F
 [75] F F F F F F F F F F F M M M M M M M M M M M M M M M M M M M M M M M M
M M
[112] M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
M M
[149] M M M M M
Levels: F M

$posterior
               F            M
1   9.909078e-01 9.092155e-03
2   9.999655e-01 3.446277e-05
3   9.999983e-01 1.743091e-06
4   9.997338e-01 2.662160e-04
5   9.999992e-01 8.487788e-07
6   1.000000e+00 1.240453e-09
7   9.977228e-01 2.277207e-03
8   9.999997e-01 3.240250e-07
9   9.907005e-01 9.299547e-03
10  9.999910e-01 9.015325e-06
11  9.989992e-01 1.000786e-03
12  9.879689e-01 1.203115e-02
13  9.999999e-01 5.121521e-08
14  9.990285e-01 9.714536e-04
15  9.994961e-01 5.039181e-04
16  9.999281e-01 7.189251e-05
17  9.999959e-01 4.132020e-06
18  1.000000e+00 6.345395e-10
19  9.998586e-01 1.414129e-04
20  9.931840e-01 6.816027e-03
21  9.999998e-01 2.373302e-07
22  9.977888e-01 2.211185e-03
23  9.149174e-01 8.508259e-02
24  9.886453e-01 1.135473e-02
25  9.986434e-01 1.356643e-03
26  9.983032e-01 1.696775e-03
27  7.039998e-01 2.960002e-01
28  9.960977e-01 3.902309e-03
29  9.994490e-01 5.510397e-04
30  9.998429e-01 1.571487e-04
31  9.973034e-01 2.696605e-03
32  7.941107e-01 2.058893e-01
33  9.997887e-01 2.112797e-04
34  9.987451e-01 1.254929e-03
35  9.883083e-01 1.169170e-02
36  9.994376e-01 5.624386e-04
37  9.999976e-01 2.396930e-06
38  1.000000e+00 2.472603e-11
39  9.995669e-01 4.331337e-04
40  9.999662e-01 3.377496e-05
41  9.972905e-01 2.709532e-03
42  8.968123e-01 1.031877e-01
43  9.999966e-01 3.391523e-06
44  9.999681e-01 3.191964e-05
45  9.767181e-01 2.328195e-02
46  9.998816e-01 1.184359e-04
47  9.999812e-01 1.881693e-05
48  9.998249e-01 1.751186e-04
49  9.999023e-01 9.772348e-05
50  9.938317e-01 6.168348e-03
51  9.960619e-01 3.938106e-03
52  9.999999e-01 1.451635e-07
53  9.999736e-01 2.637331e-05
54  9.999979e-01 2.067820e-06
55  9.995480e-01 4.520216e-04
56  9.967328e-01 3.267176e-03
57  6.950057e-01 3.049943e-01
58  9.450219e-01 5.497811e-02
59  9.983770e-01 1.623042e-03
60  9.937325e-01 6.267495e-03
61  9.999940e-01 5.992460e-06
62  9.290384e-01 7.096159e-02
63  1.000000e+00 3.090428e-08
64  9.989404e-01 1.059617e-03
65  9.999999e-01 1.339539e-07
66  9.999500e-01 5.004774e-05
67  9.370296e-01 6.297037e-02
68  9.998907e-01 1.093299e-04
69  1.000000e+00 4.246494e-08
70  9.999993e-01 7.131770e-07
71  9.999999e-01 5.684500e-08
72  9.999953e-01 4.707545e-06
73  9.999996e-01 3.791434e-07
74  9.999677e-01 3.233061e-05
75  9.996344e-01 3.656492e-04
76  9.999999e-01 5.756549e-08
77  9.998870e-01 1.130204e-04
78  9.999933e-01 6.669888e-06
79  9.999662e-01 3.376694e-05
80  9.943574e-01 5.642595e-03
81  9.998172e-01 1.828443e-04
82  9.999909e-01 9.117026e-06
83  1.000000e+00 2.896873e-08
84  9.999956e-01 4.400247e-06
85  9.997551e-01 2.449120e-04
86  1.118353e-04 9.998882e-01
87  8.088344e-04 9.991912e-01
88  3.182563e-06 9.999968e-01
89  4.829728e-04 9.995170e-01
90  8.256460e-03 9.917435e-01
91  5.961000e-05 9.999404e-01
92  6.562593e-06 9.999934e-01
93  5.997017e-03 9.940030e-01
94  1.014214e-05 9.999899e-01
95  4.860967e-06 9.999951e-01
96  3.166796e-05 9.999683e-01
97  1.658551e-02 9.834145e-01
98  5.585009e-03 9.944150e-01
99  1.229610e-05 9.999877e-01
100 8.142222e-06 9.999919e-01
101 3.214697e-04 9.996785e-01
102 5.452544e-03 9.945475e-01
103 2.786717e-04 9.997213e-01
104 2.304626e-04 9.997695e-01
105 9.294258e-04 9.990706e-01
106 1.179463e-05 9.999882e-01
107 1.098722e-05 9.999890e-01
108 1.395204e-01 8.604796e-01
109 1.785134e-05 9.999821e-01
110 2.752661e-03 9.972473e-01
111 5.885962e-05 9.999411e-01
112 1.783427e-02 9.821657e-01
113 5.061960e-04 9.994938e-01
114 2.236694e-04 9.997763e-01
115 5.408911e-04 9.994591e-01
116 5.733275e-04 9.994267e-01
117 1.286098e-01 8.713902e-01
118 2.343744e-05 9.999766e-01
119 6.161715e-04 9.993838e-01
120 4.326539e-05 9.999567e-01
121 8.963640e-10 1.000000e+00
122 1.153872e-04 9.998846e-01
123 3.755400e-05 9.999624e-01
124 1.118491e-06 9.999989e-01
125 2.067573e-06 9.999979e-01
126 2.609586e-05 9.999739e-01
127 2.882893e-07 9.999997e-01
128 4.270992e-06 9.999957e-01
129 7.081317e-06 9.999929e-01
130 7.573557e-04 9.992426e-01
131 2.164291e-06 9.999978e-01
132 7.366522e-04 9.992633e-01
133 2.744061e-05 9.999726e-01
134 5.043407e-05 9.999496e-01
135 2.649810e-04 9.997350e-01
136 2.857448e-04 9.997143e-01
137 6.320064e-04 9.993680e-01
138 3.699555e-04 9.996300e-01
139 3.457837e-03 9.965422e-01
140 3.330767e-04 9.996669e-01
141 8.506309e-04 9.991494e-01
142 1.176206e-03 9.988238e-01
143 7.019684e-05 9.999298e-01
144 1.892658e-04 9.998107e-01
145 1.245709e-01 8.754291e-01
146 1.639240e-03 9.983608e-01
147 1.180452e-04 9.998820e-01
148 1.716352e-01 8.283648e-01
149 1.184013e-04 9.998816e-01
150 9.389177e-06 9.999906e-01
151 2.144934e-06 9.999979e-01
152 4.947553e-05 9.999505e-01
153 3.680029e-06 9.999963e-01

$x
            LD1
1   -0.80961254
2   -2.11788926
3   -2.81702199
4   -1.63887851
5   -2.98560972
6   -4.51502614
7   -1.13556177
8   -3.21121050
9   -0.80427973
10  -2.43204583
11  -1.32847284
12  -0.74329959
13  -3.64339559
14  -1.33544888
15  -1.48933123
16  -1.94562057
17  -2.61481807
18  -4.67206755
19  -1.78711512
20  -0.87765265
21  -3.28415603
22  -1.14246989
23  -0.26703434
24  -0.75701601
25  -1.25711681
26  -1.20462625
27   0.08643675
28  -1.00899549
29  -1.46837770
30  -1.76239359
31  -1.09586068
32  -0.02682685
33  -1.69303805
34  -1.27539872
35  -0.75008498
36  -1.46357824
37  -2.74239885
38  -5.43229677
39  -1.52480923
40  -2.12261236
41  -1.09473729
42  -0.21715406
43  -2.66108436
44  -2.13584882
45  -0.58595351
46  -1.82866014
47  -2.25965744
48  -1.73702428
49  -1.87369947
50  -0.90119661
51  -1.00684780
52  -3.39932316
53  -2.18056581
54  -2.77699973
55  -1.51480516
56  -1.05076180
57   0.09646157
58  -0.37692329
59  -1.21505159
60  -0.89743756
61  -2.52772964
62  -0.31313946
63  -3.76173716
64  -1.31507689
65  -3.41815064
66  -2.03047848
67  -0.34313573
68  -1.84740464
69  -3.68728863
70  -3.02638958
71  -3.61896275
72  -2.58426857
73  -3.17440750
74  -2.13285173
75  -1.56450450
76  -3.61601207
77  -1.83962616
78  -2.50263849
79  -2.12266797
80  -0.92219130
81  -1.72690842
82  -2.42941777
83  -3.77688946
84  -2.60008350
85  -1.65842406
86   2.42092801
87   1.95723774
88   3.25481432
89   2.07811374
90   1.41122969
91   2.56834629
92   3.08526967
93   1.48666816
94   2.98328621
95   3.15558707
96   2.71653535
97   1.24583994
98   1.50343993
99   2.93816864
100  3.03474174
101  2.17351447
102  1.50909462
103  2.20699501
104  2.25150568
105  1.92465176
106  2.94792340
107  2.96453633
108  0.71562570
109  2.85083072
110  1.66985935
111  2.57131425
112  1.22853563
113  2.06710595
114  2.25851660
115  2.05156686
116  2.03791535
117  0.73765397
118  2.78704689
119  2.02102158
120  2.64342798
121  5.16997019
122  2.41360218
123  2.67659624
124  3.49979670
125  3.35585968
126  2.76187545
127  3.81741879
128  3.18590023
129  3.06744739
130  1.97265595
131  3.34514926
132  1.97915422
133  2.75010347
134  2.60750886
135  2.21880002
136  2.20112130
137  2.01507335
138  2.14059243
139  1.61626150
140  2.16520218
141  1.94542427
142  1.86942644
143  2.53004471
144  2.29765229
145  0.74621248
146  1.79155331
147  2.40826630
148  0.65818257
149  2.40756044
150  3.00135867
151  3.34725397
152  2.61200449
153  3.22078969
Df  Wilks approx F num Df den Df    Pr(>F)    
Y           1   0.18   226.40      3    149 < 2.2e-16 ***
Residuals 151                                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] -1.897114
[1] 2.371392
[1] 26.23579
#
[Your data and output listings removed. For comments, see at end]
On 24-May-09 13:01:26, cdm wrote:
The first thing I did was to plot your data. This indicates in the
first place that a perfect discrimination can be obtained on the
basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated
to WG, WT, ID, SEX):

  d.csv("horsesLDA.csv")
  # names(D0) # "WRMA_WG"  "WRMA_WT"  "WRMA_ID"  "WRMA_SEX"
  WG<-D0$WRMA_WG; WT<-D0$WRMA_WT;
  ID<-D0$WRMA_ID; SEX<-D0$WRMA_SEX

  ix.M<-(SEX=="M"); ix.F<-(SEX=="F")

  ## Plot WT vs ID (M & F)
  plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
  points(ID[ix.M],WT[ix.M],pch="+",col="blue")
  points(ID[ix.F],WT[ix.F],pch="+",col="red")
  lines(ID,15.5-1.0*(ID))

and that there is a lot of possible variation in the discriminating
line WT = 15.5-1.0*(ID)

Also, it is apparent that the covariance between WT and ID for Females
is different from the covariance between WT and ID for Males. Hence
the assumption (of common covariance matrix in the two groups) for
standard LDA (which you have been applying) does not hold.

Given that the sexes can be perfectly discriminated within the data
on the basis of the linear discriminator (WT + ID) (and others),
the variable WG is in effect a close approximation to noise.

However, to the extent that there was a common covariance matrix
to the two groups (in all three variables WG, WT, ID), and this
was well estimated from the data, then inclusion of the third
variable WG could yield a slightly improved discriminator in that
the probability of misclassification (a rare event for such data)
could be minimised. But it would not make much difference!

However, since that assumption does not hold, this analysis would
not be valid.

If you plot WT vs WG, a common covariance is more plausible; but
there is considerable overlap for these two variables:

  plot(WG,WT)
  points(WG[ix.M],WT[ix.M],pch="+",col="blue")
  points(WG[ix.F],WT[ix.F],pch="+",col="red")

If you plot WG vs ID, there is perhaps not much overlap, but a
considerable difference in covariance between the two groups:

  plot(ID,WG)
  points(ID[ix.M],WG[ix.M],pch="+",col="blue")
  points(ID[ix.F],WG[ix.F],pch="+",col="red")

This looks better on a log scale, however:

  lWG <- log(WG) ; lWT <- log(WT) ; lID <- log(ID)
## Plot log(WG) vs log(ID) (M & F)
  plot(lID,lWG)
  points(lID[ix.M],lWG[ix.M],pch="+",col="blue")
  points(lID[ix.F],lWG[ix.F],pch="+",col="red")

and common covaroance still looks good for WG vs WT:

  ## Plot log(WT) vs log(WG) (M & F)
  plot(lWG,lWT)
  points(lWG[ix.M],lWT[ix.M],pch="+",col="blue")
  points(lWG[ix.F],lWT[ix.F],pch="+",col="red")

but there is no improvement for WG vs IG:

  ## Plot log(WT) vs log(ID) (M & F)
  plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
  points(ID[ix.M],WT[ix.M],pch="+",col="blue")
  points(ID[ix.F],WT[ix.F],pch="+",col="red")

So there is no simple road to applying a routine LDA to your data.

To take account of different covariances between the two groups,
you would normally be looking at a quadratic discriminator. However,
as indicated above, the fact that a linear discriminator using
the variables ID & WT alone works so well would leave considerable
imprecision in conclusions to be drawn from its results.

Sorry this is not the straightforward answer you were hoping for
(which I confess I have not sought); it is simply a reaction to
what your data say.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-May-09                                       Time: 20:07:43
------------------------------ XFMail ------------------------------
cdm
#
Dear Ted,

Thank you for taking the time out to help me with this analysis. I'm seeing
that I may have left out a crucial detail concerning this analysis. The ID
measurement (interpubic distance) is a new measurement that has never been
used in the field of ornithology (to my knowledge). The objective of the
paper is to demonstrate the usefulness of ID. The paper compared ID with
plumage criterion, a categorical variable at best, but under peer-review
there is a request to use other morphological data to compare/contrast ID.
Unfortunately, wing (WG) and weight (WT) were the only measurements taken in
addition to ID in this study.

The purpose of the LDA is to demonstrate the power if ID in the context of
WG and WT. I agree that WG is a terrible metric for discrimination, WT is
good but there is significant overlap between groups, but ID is a good
discriminator on it's own (classified 97-100% of all individuals based on
92.5% CI).

You pointed out that I am violating assumptions with LDA based on different
covariances between sexes (thank you... I never would have caught it). I'm
wondering how to proceed.

Should I:

1) Perform linear discrimination with WT and ID, and then determine a
classification equation? And, if I do how do I derive the classification
equation (e.g. [Cj = cj0+ cjWTxWT+ cjIDxID; Cj>x= male, Cj<x=female])

2) Demonstrate that ID is important based on linear discrimanant
coefficients and structure coefficients from this WG, WT, and ID LDA;
discuss the assumption violation and argue for it's use as a demonstration
of variable predicting power; and NOT provide a classification equation
because we already have ID ranges and it would be inappropriate.

3) Both #1 and #2 because WT and ID provide such a good discriminating
function and use the WG, WT, and ID LDA for demonstration of variable
prediction value.

4) ??? better suggestions.




THANK YOU so much for responding and all of your insight. I'm humbled by
your R skills... that code nearly too me all day to write (little by little
I'm learning).

Chase
Ted.Harding-2 wrote:

  
    
#
[Apologies -- I made an error (see at [***] near the end)]
On 24-May-09 19:07:46, Ted Harding wrote:
[***]
The above is incorrect! Apologies. I plotted the raw WT and ID
instead of their logs. In fact, if you do plot the logs:

  ## Plot log(WT) vs log(ID) (M & F)
  plot(lID,lWT)
  points(lID[ix.M],lWT[ix.M],pch="+",col="blue")
  points(lID[ix.F],lWT[ix.F],pch="+",col="red")

you now get what looks like much closer agreement between the
covariance cov(lID,lWT) then before. Hence, I would now suggest
that you do your limear discrimination on the logarithms of the
variables (since you also get agreement for the other pairs on
the log scale.

In fact:

[Raw]:
  [Male]:
  cov(cbind(WG,WT,ID)[ix.M,])
  #            WG         WT          ID
  # WG  2.2552465 0.11074710 -0.02202080
  # WT  0.1107471 0.33853450  0.06601287
  # ID -0.0220208 0.06601287  0.31979368

  [Female]:
  cov(cbind(WG,WT,ID)[ix.F,])
  #           WG        WT        ID
  # WG  2.4716912 0.1577307   0.6670657
  # WT  0.1577307 0.3183928   0.2973335
  # I D 0.6670657 0.2973335   2.8326520

[log]:
  [Male]:
  cov(cbind(lWG,lWT,lID)[ix.M,])
  #               lWG          lWT           lID
  # lWG  0.0006584465 0.0001813315 -0.0002133576
  # lWT  0.0001813315 0.0030368382  0.0030442356
  # lID -0.0002133576 0.0030442356  0.0693965979

  [Female]:
  cov(cbind(lWG,lWT,lID)[ix.F,])
  #              lWG          lWT         lID
  # lWG  0.0007244826 0.0002171885  0.001951343
  # lWT  0.0002171885 0.0019640076  0.003305884
  # lID  0.0019513428 0.0033058841  0.068406840
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-May-09                                       Time: 21:49:50
------------------------------ XFMail ------------------------------
cdm
#
Ted,

I just ran everything using the log of all variables. Much better analysis
and it doesn't violate the assumptions.

I'm still in the dark concerning the classification equation- other than the
fact that it now will contain log functions.

Thank you for you help,

Chase
Ted.Harding-2 wrote:

  
    
#
On 24-May-09 20:32:06, cdm wrote:
Many thanks for the above additional explanation, Chase. It leads to
an interpretation of the log(ID) vs log(LD) plot which could be fruitful.
Namely, the ID is a linear dimension, and the WT could be considered
as closely reflecting a (linear dimsnion)^3. If you look at the plot
of log(WT) vs log(ID):

  ## Plot log(WT) vs log(ID) (M & F)
  plot(lID,lWT)
  points(lID[ix.M],lWT[ix.M],pch="+",col="blue")
  points(lID[ix.F],lWT[ix.F],pch="+",col="red")

it is apparent that a linear increase in log(ID) as log(WT) increases
is a very good description of what is happening. Also, that the
scatter about the linear relationship is very uniform. Therefore,
a linear regression of log(ID) on log(WT) should be closely related
to the linear discrimination. First, the linear regression:

    lLM <- lm(lID ~ lWT)
    summary(lLM)$coef
  #               Estimate Std. Error   t value     Pr(>|t|)
  # (Intercept) -10.657775  0.6562166 -16.24125 5.971407e-35
  # lWT           4.901037  0.2671783  18.34369 2.899008e-40

so the slope is 4.901037, and the slope of a linear discriminant
is likely to be close to -1/4.901037 = 0.2040385. So:

    library(MASS)
    lda(SEX ~ lWG + lWT + lID)
  # [...]
  # Coefficients of linear discriminants:
  #            LD1
  # lWG   5.304967
  # lWT -11.604919
  # lID  -2.707374

so the slope of a linear discriminant (based on all 3 variables)
with respect to variation in log(WT) and log(ID) alone is
  -2.707374/11.604919 = -0.2332954
which is quite close to the above. It is also interesting to do the
discrimination using only log(WT) and log(ID):

  lda(SEX ~ lWT + lID)
  # [...]
  # Coefficients of linear discriminants:
  #            LD1
  # lWT -11.352949
  # lID  -2.673019

So *very little change* compared with using all three variables;
and the slope of this discriminant is -2.673019/11.352949 =  -0.2354471,
almost unchanged compared with the three variables.

You can see the performance of the discriminator by plotting
histograms of it (here I'll use the 2-variable one):

  ix.M <- (SEX=="M") ; ix.F <- (SEX=="F")
  LD <- 11.352949*lWT + 2.673019*lID
  hist((2.673019*lID + 11.352949*lWT)[ix.M],
        breaks=0.5*(40:80),col="blue")
  hist((2.673019*lID + 11.352949*lWT)[ix.F],
        breaks=0.5*(40:80),col="red",add=TRUE)

Inspection of this, however, raises some interesting questions
which I'd prefer to discuss with you off-list (also your queries
relating to efficacy of ID).

Ted.
[But see just one short comment below]
As pointed out in my correction, if you work with logs it looks OK
on that front! More later.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-May-09                                       Time: 22:46:12
------------------------------ XFMail ------------------------------