Skip to content

Interpreting Multiple Linear Regression Summary

15 messages · David Winsemius, Daniel Nordlund, Marc Schwartz +4 more

#
I would appreciate pointers on what I should read to understand this
output:

  summary(lm(TDS ~ Cond + Ca + Cl + Mg + Na + SO4))

Call:
lm(formula = TDS ~ Cond + Ca + Cl + Mg + Na + SO4)

Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!

Coefficients: (6 not defined because of singularities)
             Estimate Std. Error t value Pr(>|t|)
(Intercept)      125         NA      NA       NA
Cond              NA         NA      NA       NA
Ca                NA         NA      NA       NA
Cl                NA         NA      NA       NA
Mg                NA         NA      NA       NA
Na                NA         NA      NA       NA
SO4               NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
   (63 observations deleted due to missingness)

   When I look at the summary for the data frame used for this model I do not
see an excessive number of missing values or indications why there are no
residual degrees of freedom. The same model applied to 8 other data frames
did not produce similar results.

Puzzled,

Rich
#
Please see ?dput

use dput(your data) and paste the output into a reply, thanks.

This way we know what you are working with.
Rich Shepard wrote:
--
View this message in context: http://r.789695.n4.nabble.com/Interpreting-Multiple-Linear-Regression-Summary-tp4020516p4020567.html
Sent from the R help mailing list archive at Nabble.com.
#
Rich,

I don't see a 'data=' parameter in your call to lm().  How does lm() know where to find the variables referenced in the model parameter?

If that is not the problem, then we need to see str() output for the data frame that you are analyzing.

Dan

Daniel Nordlund
Bothell, WA USA
#
On Nov 9, 2011, at 12:04 PM, Rich Shepard wrote:

            
I don't see a data= argument specified, so you are telling lm() that  
your workspace has individual vectors by those names in the formula.  
That is not what is implied by hte rest of your message.
David Winsemius, MD
West Hartford, CT
#
On Wed, 9 Nov 2011, David Winsemius wrote:

            
David,

   That's because I attached the data frame before running the model.

   However, looking again at the scatter plots of the individual predictor variables
with the response variable answered my question after I posted it. There are
no patterns to the relationships in these scatter plots so there's nothing
to model. I became caught up in the repetitive processing for all these data
and stopped really seeing what was in front of me.

My apologies to the list,

Rich
#
Rich,

the problem is not just that there was 'nothing to model.'  If that were the case, you would have gotten non-significant parameter estimates, not NA's.  I would guess that there is something problematic with the how the data frame is structured relative to what lm() is expecting.  So, I would not give up looking for a solution just yet.  Can you show us the result of str() on the data frame that you attached?

Dan

Daniel Nordlund
Bothell, WA USA
#
On Wed, 9 Nov 2011, Daniel Nordlund wrote:

            
Dan,

   I was not comfortable with my explanation, but the formula (and data
frame) was equivalent to those of the other 8 streams.
OK. I'm always up for learning more about R and its processes.
Sure. I subset the original data frame to select only the 6 predictor
variables and the response variable. Same lm() results. I'll provide the
data frame, too.

summary(lm(formula = TDS ~ Cond + Ca + Cl + Mg + Na + SO4, data =
mod.stump.cast))

Call:
lm(formula = TDS ~ Cond + Ca + Cl + Mg + Na + SO4, data = mod.stump.cast)

Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!

Coefficients: (6 not defined because of singularities)
             Estimate Std. Error t value Pr(>|t|)
(Intercept)      125         NA      NA       NA
Cond              NA         NA      NA       NA
Ca                NA         NA      NA       NA
Cl                NA         NA      NA       NA
Mg                NA         NA      NA       NA
Na                NA         NA      NA       NA
SO4               NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
   (63 observations deleted due to missingness)

  str(mod.stump.cast)
'data.frame':	64 obs. of  7 variables:
  $ Ca  : num  NA NA 24.4 NA 21.4 NA NA NA NA NA ...
  $ Cl  : num  1.58 5.6 3 NA 1 5 1.2 4 4 8.4 ...
  $ Cond: num  NA NA 190 187 184 NA NA NA NA NA ...
  $ Mg  : num  NA NA 10 NA 9.1 NA NA NA NA NA ...
  $ Na  : num  NA NA NA NA NA NA NA NA NA NA ...
  $ SO4 : num  9.4 6.5 9 NA 7 55 6.8 105 15.6 8.4 ...
  $ TDS : num  105 181 112 144 114 308 96 430 108 108 ...

summary(mod.stump.cast)
        Ca              Cl              Cond             Mg              Na
  Min.   : 0.60   Min.   : 1.000   Min.   :  2.2   Min.   : 9.10   Min.   : 4
  1st Qu.:23.35   1st Qu.: 2.000   1st Qu.:214.8   1st Qu.:11.00   1st Qu.: 4
  Median :28.35   Median : 4.000   Median :282.5   Median :17.40   Median : 4
  Mean   :32.77   Mean   : 4.076   Mean   :294.6   Mean   :17.85   Mean   : 4
  3rd Qu.:40.55   3rd Qu.: 5.600   3rd Qu.:372.0   3rd Qu.:22.10   3rd Qu.: 4
  Max.   :64.30   Max.   :13.000   Max.   :636.0   Max.   :32.40   Max.   : 4
  NA's   :50.00   NA's   :11.000   NA's   : 42.0   NA's   :51.00   NA's   :62
       SO4              TDS
  Min.   :  4.00   Min.   : 14.0
  1st Qu.:  7.00   1st Qu.:131.2
  Median :  9.40   Median :174.0
  Mean   : 16.31   Mean   :176.9
  3rd Qu.: 17.00   3rd Qu.:195.5
  Max.   :105.00   Max.   :430.0
  NA's   :  3.00   NA's   :  2.0

  mod.stump.cast
      Ca    Cl  Cond   Mg Na   SO4 TDS
1    NA  1.58    NA   NA NA   9.4 105
2    NA  5.60    NA   NA NA   6.5 181
3  24.4  3.00 190.0 10.0 NA   9.0 112
4    NA    NA 187.0   NA NA    NA 144
5  21.4  1.00 184.0  9.1 NA   7.0 114
6    NA  5.00    NA   NA NA  55.0 308
7    NA  1.20    NA   NA NA   6.8  96
8    NA  4.00    NA   NA NA 105.0 430
9    NA  4.00    NA   NA NA  15.6 108
10   NA  8.40    NA   NA NA   8.4 108
11   NA  1.00    NA   NA NA   8.8 125
12   NA  1.40    NA   NA NA  19.4 129
13   NA  4.90    NA   NA NA  37.0 360
14   NA  1.70    NA   NA NA  12.0 140
15   NA  2.00    NA   NA NA  10.0  95
16   NA  1.60    NA   NA NA   9.1 120
17   NA  3.30    NA   NA NA  34.0 280
18   NA  2.20    NA   NA NA  11.0 130
19   NA  9.00    NA   NA NA  69.0 352
20   NA  1.00    NA   NA NA  18.0 148
21   NA  2.00    NA   NA NA   9.0 107
22 28.0  1.00 248.0 11.0  4  13.0 125
23 32.0  1.00    NA 12.0  4   9.0 139
24   NA  5.00    NA   NA NA   7.0 188
25   NA  4.00    NA   NA NA   6.0 201
26   NA  3.00    NA   NA NA   5.0 178
27   NA  2.27    NA   NA NA   7.8 197
28   NA  1.76    NA   NA NA   7.8 187
29   NA  5.81    NA   NA NA   7.5 182
30   NA  4.23    NA   NA NA   6.0 165
31   NA  4.23    NA   NA NA   7.3 186
32   NA  6.25    NA   NA NA   7.0 191
33   NA  6.72    NA   NA NA   7.5 190
34 34.7  4.00 304.0 17.4 NA   6.0 176
35   NA    NA 354.0   NA NA   7.0 175
36 42.5  5.00 379.0 21.1 NA   7.0 220
37   NA  5.80    NA   NA NA   5.6 163
38 26.0  5.80 300.0 24.0 NA   5.6 163
39   NA  2.20    NA   NA NA   5.4 152
40   NA  5.40    NA   NA NA  11.0 221
41   NA  5.40    NA   NA NA  10.5 171
42   NA  4.80    NA   NA NA   9.9 204
43   NA  8.00    NA   NA NA  11.7 174
44   NA  1.00    NA   NA NA   8.4 190
45   NA  4.80    NA   NA NA  12.1 174
46   NA  5.90    NA   NA NA  16.0 210
47   NA  5.90    NA   NA NA  20.0 190
48   NA 13.00    NA   NA NA   7.6 180
49   NA  5.60    NA   NA NA  17.0 200
50   NA  1.20    NA   NA NA   6.5 180
51  0.6    NA   2.2   NA NA    NA  NA
52 21.4    NA 187.0  9.5 NA   8.0 120
53   NA    NA 285.0   NA NA  22.0 135
54 48.3  3.00 378.0 22.1 NA  24.0 228
55 63.5  7.00 533.0 29.9 NA  44.0  14
56   NA    NA 207.0   NA NA    NA  NA
57   NA    NA 262.0   NA NA  13.0 156
58 28.7  2.00 244.0 12.6 NA  13.0 140
59   NA    NA 238.0   NA NA  12.0 128
60   NA    NA 280.0   NA NA  18.0 160
61   NA    NA 380.0   NA NA  23.0 215
62   NA    NA 402.0   NA NA  23.0 230
63 64.3  7.00 636.0 32.4 NA  73.0 316
64 23.0  4.10 300.0 21.0 NA   4.0 163

Thanks,

Rich
#
On Nov 9, 2011, at 1:17 PM, Rich Shepard wrote:

            
Here is your problem:

# 'DF' is the result of copying your data above from the
# clipboard on OSX
DF <- read.table(pipe("pbpaste"), header = TRUE)
'data.frame':	64 obs. of  7 variables:
 $ Ca  : num  NA NA 24.4 NA 21.4 NA NA NA NA NA ...
 $ Cl  : num  1.58 5.6 3 NA 1 5 1.2 4 4 8.4 ...
 $ Cond: num  NA NA 190 187 184 NA NA NA NA NA ...
 $ Mg  : num  NA NA 10 NA 9.1 NA NA NA NA NA ...
 $ Na  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ SO4 : num  9.4 6.5 9 NA 7 55 6.8 105 15.6 8.4 ...
 $ TDS : int  105 181 112 144 114 308 96 430 108 108 ?
Ca Cl Cond Mg Na SO4 TDS
22 28  1  248 11  4  13 125


After removing incomplete records (any records with NA values) which is the default behavior for R model functions, you only have one record left to fit the model to.

HTH,

Marc Schwartz
#
As far as I know if there is an NA in any variable in an observation
the default is to drop the entire observation.  Thus there are no
observations in your calculation

Best Regards

John
On 9 November 2011 19:17, Rich Shepard <rshepard at appl-ecosys.com> wrote:

  
    
#
On Wed, 9 Nov 2011, Marc Schwartz wrote:

            
Marc,

   Oh? I don't do Apple so there's no OSX here.
That's what I saw from the scatter plots.

Thanks,

Rich
#
On Nov 9, 2011, at 2:17 PM, Rich Shepard wrote:

            
I count exactly 1 line in the data.frame below that have all columns  
with non-NA values. It should be no surprise that its 'TDS' value  
(=125) is the same as the estimated Intercept. I cannot understand why  
you mislead us to such an extent about the degree of missing-ness in  
that data.

(Failing to indicate that you have attached a dataframe is also very  
discourteous.)
#
On Wed, 9 Nov 2011, John C Frain wrote:

            
John,

   Hadn't realized that. I know there are NA's in other data frames that
yield model results. Perhaps it is the excessive numbers in this set that
are the problem.

Thanks,

Rich
#
On 09-Nov-11 19:39:54, Rich Shepard wrote:
It is not so much the number of NAs, as the number of observations
that get dropped through having at least 1 NA. Provided enough
observations remain to get a meaningful fit, you will be OK
(though interpretation may be dubious).

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 09-Nov-11                                       Time: 20:06:24
------------------------------ XFMail ------------------------------
#
There is only one row with a complete set of observations; I think lm() is
throwing out the rest.
Rich Shepard wrote:
--
View this message in context: http://r.789695.n4.nabble.com/Interpreting-Multiple-Linear-Regression-Summary-tp4020516p4021352.html
Sent from the R help mailing list archive at Nabble.com.
#
This is the output of dput(your data)


structure(list(Ca = c(NA, NA, 24.4, NA, 21.4, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 28, 32, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 34.7, NA, 42.5, NA, 26, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.6, 21.4, NA, 48.3, 
63.5, NA, NA, 28.7, NA, NA, NA, NA, 64.3, 23), Cl = c(1.58, 5.6, 
3, NA, 1, 5, 1.2, 4, 4, 8.4, 1, 1.4, 4.9, 1.7, 2, 1.6, 3.3, 2.2, 
9, 1, 2, 1, 1, 5, 4, 3, 2.27, 1.76, 5.81, 4.23, 4.23, 6.25, 6.72, 
4, NA, 5, 5.8, 5.8, 2.2, 5.4, 5.4, 4.8, 8, 1, 4.8, 5.9, 5.9, 
13, 5.6, 1.2, NA, NA, NA, 3, 7, NA, NA, 2, NA, NA, NA, NA, 7, 
4.1), Cond = c(NA, NA, 190, 187, 184, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 248, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 304, 354, 379, NA, 300, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.2, 187, 285, 378, 533, 
207, 262, 244, 238, 280, 380, 402, 636, 300), Mg = c(NA, NA, 
10, NA, 9.1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 11, 12, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
17.4, NA, 21.1, NA, 24, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 9.5, NA, 22.1, 29.9, NA, NA, 12.6, NA, NA, NA, NA, 
32.4, 21), Na = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4L, 4L, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA), SO4 = c(9.4, 6.5, 9, NA, 7, 55, 6.8, 105, 
15.6, 8.4, 8.8, 19.4, 37, 12, 10, 9.1, 34, 11, 69, 18, 9, 13, 
9, 7, 6, 5, 7.8, 7.8, 7.5, 6, 7.3, 7, 7.5, 6, 7, 7, 5.6, 5.6, 
5.4, 11, 10.5, 9.9, 11.7, 8.4, 12.1, 16, 20, 7.6, 17, 6.5, NA, 
8, 22, 24, 44, NA, 13, 13, 12, 18, 23, 23, 73, 4), TDS = c(105L, 
181L, 112L, 144L, 114L, 308L, 96L, 430L, 108L, 108L, 125L, 129L, 
360L, 140L, 95L, 120L, 280L, 130L, 352L, 148L, 107L, 125L, 139L, 
188L, 201L, 178L, 197L, 187L, 182L, 165L, 186L, 191L, 190L, 176L, 
175L, 220L, 163L, 163L, 152L, 221L, 171L, 204L, 174L, 190L, 174L, 
210L, 190L, 180L, 200L, 180L, NA, 120L, 135L, 228L, 14L, NA, 
156L, 140L, 128L, 160L, 215L, 230L, 316L, 163L)), .Names = c("Ca", 
"Cl", "Cond", "Mg", "Na", "SO4", "TDS"), class = "data.frame", row.names =
c(NA, 
-64L))
B77S wrote:
--
View this message in context: http://r.789695.n4.nabble.com/Interpreting-Multiple-Linear-Regression-Summary-tp4020516p4021355.html
Sent from the R help mailing list archive at Nabble.com.