Skip to content

linear regression

2 messages · Donald Lehmann, Peter Dalgaard

#
Dear Consultant

I've done linear regression successfully on R a few times before.  But this 
time it keeps telling me:-

"Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
         0 (non-NA) cases"

The model is:-

fm1 <- lm(TS.CM ~ AGE + SEX + HFE.Y.01 + TFC2B.01 + HFE.Y.01*TFC2B.01, data 
= IRONresults, subset = DIAG2.1D == 0)
summary (fm1)

TS.CM is a continuous variable (%s), sex is coded 0 = women, 1 = men, 
DIAG2.1D is coded 0 = non-demented, 1 = ALzheimer's disease and the genes, 
HFE.Y.01 & TFC2B.01, are coded 0 = non-carrier and 1 = carrier

I've tried recoding the data to use 1 & 2, instead of 0 & 1, and I've 
removed the rows with missing data.  I've also tried putting "...lm(formula 
= TS.CM ~ ..."  But I always get the same error message

What am I doing wrong?

A related question: what's the minimum no of data points for regression 
analysis to work?  We have only 23 cases carrying both genes out of 447 and 
only 8 out of 264 in the above subset (ie non-demented).  I seem to 
remember hearing somewhere that you needed a minimum of ~30 (?), so 
probably this wouldn't work anyway.  Still, I'd like to know what I was 
doing wrong!

Many thanks

Donald (Lehmann)
#
Donald Lehmann <donald.lehmann at pharmacology.oxford.ac.uk> writes:
You don't need to give the main effects when there's a "*" term
(that's a SASism, the R equivalent is ":" and a*b == a+b+a:b by
definition), but that is hardly the main problem.

Could you have a look at this? :

with(IRONresults, complete.cases(TS.CM, AGE, SEX, HFE.Y.01, TFC2B.01))

If you get all FALSE, you'll know what hit you...
Technically, you just need linearly independent predictors and more
observations than parameters (incl. the intercept). Other bounds get
bandied about on what should be required for a *meaningful* analysis
(like "10 observations per parameter"), but these are quite heuristic
and empirical in nature.