why NA coefficients

8 messages · array chip, David Winsemius

Original

1

8

array chip

Mon, Nov 7, 2011 4:33 PM #

Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below:

Call:
lm(formula = y ~ factor(treat) * factor(group), data = test)

Coefficients:
????????????????? (Intercept)???????????????? factor(treat)2???????????????? factor(treat)3? 
???????????????????? 0.429244?????????????????????? 0.499982?????????????????????? 0.352971? 
?????????????? factor(treat)4???????????????? factor(treat)5???????????????? factor(treat)6? 
??????????????????? -0.204752?????????????????????? 0.142042?????????????????????? 0.044155? 
?????????????? factor(treat)7???????????????? factor(group)2? factor(treat)2:factor(group)2? 
??????????????????? -0.007775????????????????????? -0.337907????????????????????? -0.208734? 
factor(treat)3:factor(group)2? factor(treat)4:factor(group)2? factor(treat)5:factor(group)2? 
??????????????????? -0.195138?????????????????????? 0.800029?????????????????????? 0.227514? 
factor(treat)6:factor(group)2? factor(treat)7:factor(group)2? 
???????????????????? 0.331548???????????????????????????? NA 


I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case?

Thanks

John
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111107/430e48a8/attachment.txt>

David Winsemius

Mon, Nov 7, 2011 5:13 PM #

On Nov 7, 2011, at 7:33 PM, array chip wrote:

Because you have no cases in one of the crossed categories.

David Winsemius, MD
West Hartford, CT

array chip

Mon, Nov 7, 2011 7:07 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111107/51cc252a/attachment.pl>

David Winsemius

Mon, Nov 7, 2011 10:19 PM #

On Nov 7, 2011, at 10:07 PM, array chip wrote:

Well, it had to omit one of them didn't it?

(But I don't know why that level was chosen.)

David.
>
> Thanks
>
> John
>
>
> From: David Winsemius <dwinsemius at comcast.net>
> To: array chip <arrayprofile at yahoo.com>
> Cc: "r-help at r-project.org" <r-help at r-project.org>
> Sent: Monday, November 7, 2011 5:13 PM
> Subject: Re: [R] why NA coefficients
>
>
> On Nov 7, 2011, at 7:33 PM, array chip wrote:
>
> > Hi, I am trying to run ANOVA with an interaction term on 2 factors  
> (treat has 7 levels, group has 2 levels). I found the coefficient  
> for the last interaction term is always 0, see attached dataset and  
> the code below:
> >
> >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL)
> >> lm(y~factor(treat)*factor(group),test)
> >
> > Call:
> > lm(formula = y ~ factor(treat) * factor(group), data = test)
> >
> > Coefficients:
> >                  (Intercept)                 
> factor(treat)2                factor(treat)3
> >                      0.429244                       
> 0.499982                      0.352971
> >                factor(treat)4                 
> factor(treat)5                factor(treat)6
> >                    -0.204752                       
> 0.142042                      0.044155
> >                factor(treat)7                factor(group)2   
> factor(treat)2:factor(group)2
> >                    -0.007775                       
> -0.337907                      -0.208734
> > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2   
> factor(treat)5:factor(group)2
> >                    -0.195138                       
> 0.800029                      0.227514
> > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
> >                      0.331548                            NA
> >
> >
> > I guess this is due to model matrix being singular or collinearity  
> among the matrix columns? But I can't figure out how the matrix is  
> singular in this case? Can someone show me why this is the case?
>
> Because you have no cases in one of the crossed categories.
>
> --David Winsemius, MD
> West Hartford, CT
>
>
>

David Winsemius, MD
West Hartford, CT

David Winsemius

Mon, Nov 7, 2011 10:27 PM #

On Nov 8, 2011, at 1:19 AM, David Winsemius wrote:

But this output suggests there may be alligators in the swamp:

 > predict(lmod, newdata=data.frame(treat=1, group=2))
          1
0.09133691
Warning message:
In predict.lm(lmod, newdata = data.frame(treat = 1, group = 2)) :
   prediction from a rank-deficient fit may be misleading

David.
>
> -- 
> David.
>>
>> Thanks
>>
>> John
>>
>>
>> From: David Winsemius <dwinsemius at comcast.net>
>> To: array chip <arrayprofile at yahoo.com>
>> Cc: "r-help at r-project.org" <r-help at r-project.org>
>> Sent: Monday, November 7, 2011 5:13 PM
>> Subject: Re: [R] why NA coefficients
>>
>>
>> On Nov 7, 2011, at 7:33 PM, array chip wrote:
>>
>> > Hi, I am trying to run ANOVA with an interaction term on 2  
>> factors (treat has 7 levels, group has 2 levels). I found the  
>> coefficient for the last interaction term is always 0, see attached  
>> dataset and the code below:
>> >
>> >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL)
>> >> lm(y~factor(treat)*factor(group),test)
>> >
>> > Call:
>> > lm(formula = y ~ factor(treat) * factor(group), data = test)
>> >
>> > Coefficients:
>> >                  (Intercept)                 
>> factor(treat)2                factor(treat)3
>> >                      0.429244                       
>> 0.499982                      0.352971
>> >                factor(treat)4                 
>> factor(treat)5                factor(treat)6
>> >                    -0.204752                       
>> 0.142042                      0.044155
>> >                factor(treat)7                factor(group)2   
>> factor(treat)2:factor(group)2
>> >                    -0.007775                       
>> -0.337907                      -0.208734
>> > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2   
>> factor(treat)5:factor(group)2
>> >                    -0.195138                       
>> 0.800029                      0.227514
>> > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
>> >                      0.331548                            NA
>> >
>> >
>> > I guess this is due to model matrix being singular or  
>> collinearity among the matrix columns? But I can't figure out how  
>> the matrix is singular in this case? Can someone show me why this  
>> is the case?
>>
>> Because you have no cases in one of the crossed categories.
>>
>> --David Winsemius, MD
>> West Hartford, CT
>>
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

array chip

Tue, Nov 8, 2011 9:36 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111108/43bcf351/attachment.pl>

array chip

Tue, Nov 8, 2011 9:38 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111108/d6a2d1bb/attachment.pl>

David Winsemius

Tue, Nov 8, 2011 10:07 AM #

On Nov 8, 2011, at 12:36 PM, array chip wrote:

Have you considered redefining the implicit base level for "treat" so  
it does not create the missing crossed-category?

 > test$treat2_ <- factor(test$treat, levels=c(2:7, 1) )
 > lm(y~treat2_*factor(group),test)

Call:
lm(formula = y ~ treat2_ * factor(group), data = test)

Coefficients:
             (Intercept)                 treat2_3                  
treat2_4
               0.9292256               -0.1470106                
-0.7047343
                treat2_5                 treat2_6                  
treat2_7
              -0.3579398               -0.4558269                
-0.5077571
                treat2_1           factor(group)2   
treat2_3:factor(group)2
              -0.4999820               -0.5466405                 
0.0135963
treat2_4:factor(group)2  treat2_5:factor(group)2   
treat2_6:factor(group)2
               1.0087628                0.4362479                 
0.5402821
treat2_7:factor(group)2  treat2_1:factor(group)2
               0.2087338                       NA

All the "group-less" coefficients are for group1 , so  now get a  
prediction for group=1:treat=2 == "Intercept", group=1:treat=3 , ....  
a total of 7 values.

And there are 6 predictions for group2.

The onus is obviously on you to check the predictions against the  
data. 'aggregate' is a good function for that purpose.

david.



>
> Thanks
>
> John
>
> From: David Winsemius <dwinsemius at comcast.net>
> To: David Winsemius <dwinsemius at comcast.net>
> Cc: array chip <arrayprofile at yahoo.com>; "r-help at r-project.org" <r-help at r-project.org 
> >
> Sent: Monday, November 7, 2011 10:27 PM
> Subject: Re: [R] why NA coefficients
>
> But this output suggests there may be alligators in the swamp:
>
> > predict(lmod, newdata=data.frame(treat=1, group=2))
>         1
> 0.09133691
> Warning message:
> In predict.lm(lmod, newdata = data.frame(treat = 1, group = 2)) :
>   prediction from a rank-deficient fit may be misleading
>
> --David.
> >
> > --David.
> >>
> >> Thanks
> >>
> >> John
> >>
> >>
> >> From: David Winsemius <dwinsemius at comcast.net>
> >> To: array chip <arrayprofile at yahoo.com>
> >> Cc: "r-help at r-project.org" <r-help at r-project.org>
> >> Sent: Monday, November 7, 2011 5:13 PM
> >> Subject: Re: [R] why NA coefficients
> >>
> >>
> >> On Nov 7, 2011, at 7:33 PM, array chip wrote:
> >>
> >> > Hi, I am trying to run ANOVA with an interaction term on 2  
> factors (treat has 7 levels, group has 2 levels). I found the  
> coefficient for the last interaction term is always 0, see attached  
> dataset and the code below:
> >> >
> >> >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL)
> >> >> lm(y~factor(treat)*factor(group),test)
> >> >
> >> > Call:
> >> > lm(formula = y ~ factor(treat) * factor(group), data = test)
> >> >
> >> > Coefficients:
> >> >                  (Intercept)                 
> factor(treat)2                factor(treat)3
> >> >                      0.429244                       
> 0.499982                      0.352971
> >> >                factor(treat)4                 
> factor(treat)5                factor(treat)6
> >> >                    -0.204752                       
> 0.142042                      0.044155
> >> >                factor(treat)7                factor(group)2   
> factor(treat)2:factor(group)2
> >> >                    -0.007775                       
> -0.337907                      -0.208734
> >> > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2   
> factor(treat)5:factor(group)2
> >> >                    -0.195138                       
> 0.800029                      0.227514
> >> > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
> >> >                      0.331548                            NA
> >> >
> >> >
> >> > I guess this is due to model matrix being singular or  
> collinearity among the matrix columns? But I can't figure out how  
> the matrix is singular in this case? Can someone show me why this is  
> the case?
> >>
> >> Because you have no cases in one of the crossed categories.
> >>
> >> --David Winsemius, MD
> >> West Hartford, CT
> >>
> >>
> >>
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
>

David Winsemius, MD
West Hartford, CT