Skip to content

difference between linear model & scatterplot matrix

3 messages · Francesco Nutini, Jonathan Christensen

#
Dear R-users,
I'm studing a DB, structured like this (just a little part of my dataset): 
_____________________________________________________________________________________________________________









  Site
  Latitude
  Longitude
  Year
  Tot-Prod
  Total_Density
  dmp



  Dendoudi-1
  15.441964
  -13.540179
  2005
  3271.16
  1007
  16993.25


  Dendoudi-2
  15.397321
  -13.611607
  2005
  1616.84
  250
  25376.67


  ?
  ?
  ?
  ?
  ?
  ?
  ?

_____________________________________________________________________________________________________________

If I made a scatterplotmatrix with the command show below I obtain a matrix (visible in the image) that show which variables is more correlated with dmp data (violet color).
But, if I made a linear model between the dependent variable (dmp) and  many independent variables
I get different information about the significativity of the variable. 
I mean, variables that appear correlated with dependent variable in the matrix result not correlated in the summary of linear model, and vice versa. Have I made a mistake in the interpretation of the result, or not?

Thank you in advance,
Francesco



#command for matrix-plot
senegal5[c(  2,4,5,6,7,8,9,13,15,17,21,
39,44,45)]
abs(cor(dta))
<- dmat.color(dta.r)
order.single(dta.r)
dta.o, panel.colors=dta.col, gap=.5,
Correlation")
#command for linear model and summary()
+ Longitude +  Year +  Tot.Prod +    Herbaceous.Prod.kg.ha. +  Leaf.Prod +  Tree.bio  + Total_Density  + X1st.SpecieDensity.trunk.ha.+
X2nd.SpecieDensity.trunk.ha.+ Herb_Specie_Index1 +  iNDVI.JASO. 
+ 
RFE.Cum.JASO., data=senegal5 )
Call:

lm(formula = dmp ~
Latitude + Longitude + Year + Tot.Prod + Herbaceous.Prod.kg.ha. + 

    Leaf.Prod + Tree.bio + Total_Density +
X1st.SpecieDensity.trunk.ha. + 

    X2nd.SpecieDensity.trunk.ha. +
Herb_Specie_Index1 + iNDVI.JASO. + 

    RFE.Cum.JASO.,
data = senegal5)

Residuals:

    Min     
1Q  Median      3Q    
Max 

-676.49 -195.77  -33.06 
113.34  816.17 



Coefficients:

                               Estimate Std. Error
t value Pr(>|t|)    

(Intercept)                  -3.283e+05  4.505e+04 
-7.288 4.41e-11 ***

Latitude                     -6.100e+01  1.990e+02 
-0.307   0.7598    

Longitude                    -3.617e+02  8.639e+01 
-4.187 5.60e-05 ***

Year                          1.604e+02  2.300e+01  
6.973 2.15e-10 ***

Tot.Prod                     -4.893e+00  1.565e+02 
-0.031   0.9751    

Herbaceous.Prod.kg.ha.        4.905e+00  1.565e+02  
0.031   0.9751    

Leaf.Prod  
                  4.842e+00  1.565e+02  
0.031   0.9754    

Tree.bio                     -4.241e+01  2.771e+02 
-0.153   0.8786    

Total_Density                -1.930e+00  8.933e-01 
-2.160   0.0329 *  

X1st.SpecieDensity.trunk.ha.  1.992e+00 
9.246e-01   2.154  
0.0333 *  

X2nd.SpecieDensity.trunk.ha.  3.416e+00 
1.642e+00   2.080   0.0398 * 


Herb_Specie_Index1           -1.091e+00  1.844e+00 
-0.592   0.5552    

iNDVI.JASO.                   8.914e+02  6.076e+01 
14.670  < 2e-16 ***

RFE.Cum.JASO.                 2.525e+00  4.529e-01  
5.575 1.68e-07 ***

---

Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ?
1 



Residual standard
error: 295.3 on 114 degrees of freedom

Multiple R-squared:
0.9206,     Adjusted R-squared: 0.9116 

F-statistic: 101.7 on
13 and 114 DF,  p-value: < 2.2e-16
#
Francesco,

My guess would be collinearity of the predictors. The linear model
gives you the best fit to all of the predictors at once; unless the
predictors are orthogonal (which in a case like this is certainly not
the case), there is no guarantee that the parameter estimates which
give the best overall fit for the linear model will be similar to
regression coefficients if you were to regress the response on each
predictor individually.

There are various ways to check collinearity, such as variance
inflation factors (VIF). You may want to look into them. It's very
dangerous to try to interpret your parameter estimates in the presence
of collinearity.

Jonathan


On Fri, Dec 3, 2010 at 7:42 AM, Francesco Nutini
<nutini.francesco at gmail.com> wrote: