Skip to content
Prev 139946 / 398502 Next

two cols in a data frame are the same factor

Hi,
I am afraid you misunderstood it. I do not have repeated records, but
for every record I have two, possibly different, simultaneously
present, instanciations of an explanatory variable.

My data is as follows :

yield haplo1 haplo2
100  A B
151  B A
212  A A

So I have one effect (haplo), but two copies of each affect "yield".
If I use lm() I get:
Call:
lm(formula = yield ~ -1 + haplo1 + haplo2, data = a)

Coefficients:
 haploA   haploB  haplo2B
    212      151     -112


But I get different coefficients for the two "A"s (in fact oe was set
to 0) and the Two "Bs" . That is, the model has four unknowns but in
my example I have just two!

A least-squares solution is simple to do by hand:

 X=matrix(c(1,1,1,1,2,0),ncol=2) #the incidence matrix
[,1] [,2]
[1,]    1    1
[2,]    1    2
[3,]    1    0
[,1]
[1,] 184.8333
[2,] -30.5000

where [1,] is the solution for A and [2,] is the solution for B

This is not difficult to do by hand, but it is for a simple case and I
miss all the machinery in lm()

Thank you
Andres
On Wed, Mar 19, 2008 at 6:57 PM, Michael Dewey <info at aghmed.fsnet.co.uk> wrote: