difference between linear model & scatterplot matrix
Francesco, My guess would be collinearity of the predictors. The linear model gives you the best fit to all of the predictors at once; unless the predictors are orthogonal (which in a case like this is certainly not the case), there is no guarantee that the parameter estimates which give the best overall fit for the linear model will be similar to regression coefficients if you were to regress the response on each predictor individually. There are various ways to check collinearity, such as variance inflation factors (VIF). You may want to look into them. It's very dangerous to try to interpret your parameter estimates in the presence of collinearity. Jonathan On Fri, Dec 3, 2010 at 7:42 AM, Francesco Nutini
<nutini.francesco at gmail.com> wrote:
Dear R-users, I'm studing a DB, structured like this (just a little part of my dataset):
_____________________________________________________________________________________________________________ ?Site ?Latitude ?Longitude ?Year ?Tot-Prod ?Total_Density ?dmp ?Dendoudi-1 ?15.441964 ?-13.540179 ?2005 ?3271.16 ?1007 ?16993.25 ?Dendoudi-2 ?15.397321 ?-13.611607 ?2005 ?1616.84 ?250 ?25376.67 ?? ?? ?? ?? ?? ?? ?? _____________________________________________________________________________________________________________ If I made a scatterplotmatrix with the command show below I obtain a matrix (visible in the image) that show which variables is more correlated with dmp data (violet color). But, if I made a linear model between the dependent variable (dmp) and ?many independent variables I get different information about the significativity of the variable. I mean, variables that appear correlated with dependent variable in the matrix result not correlated in the summary of linear model, and vice versa. Have I made a mistake in the interpretation of the result, or not? Thank you in advance, Francesco #command for matrix-plot dta <- senegal5[c( ?2,4,5,6,7,8,9,13,15,17,21, 39,44,45)] dta.r <- abs(cor(dta)) dta.col <- dmat.color(dta.r) dta.o <- order.single(dta.r) cpairs(dta, dta.o, panel.colors=dta.col, gap=.5, main="Variables Ordered and Colored by Correlation") #command for linear model and summary() a<- lm ( dmp ~ Latitude + Longitude + ?Year + ?Tot.Prod + ? ?Herbaceous.Prod.kg.ha. + ?Leaf.Prod + ?Tree.bio ?+ Total_Density ?+ X1st.SpecieDensity.trunk.ha.+ X2nd.SpecieDensity.trunk.ha.+ Herb_Specie_Index1 + ?iNDVI.JASO. + RFE.Cum.JASO., data=senegal5 ) summary(a) Call: lm(formula = dmp ~ Latitude + Longitude + Year + Tot.Prod + Herbaceous.Prod.kg.ha. + ? ?Leaf.Prod + Tree.bio + Total_Density + X1st.SpecieDensity.trunk.ha. + ? ?X2nd.SpecieDensity.trunk.ha. + Herb_Specie_Index1 + iNDVI.JASO. + ? ?RFE.Cum.JASO., data = senegal5) Residuals: ? ?Min 1Q ?Median ? ? ?3Q Max -676.49 -195.77 ?-33.06 113.34 ?816.17 Coefficients: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|) (Intercept) ? ? ? ? ? ? ? ? ?-3.283e+05 ?4.505e+04 -7.288 4.41e-11 *** Latitude ? ? ? ? ? ? ? ? ? ? -6.100e+01 ?1.990e+02 -0.307 ? 0.7598 Longitude ? ? ? ? ? ? ? ? ? ?-3.617e+02 ?8.639e+01 -4.187 5.60e-05 *** Year ? ? ? ? ? ? ? ? ? ? ? ? ?1.604e+02 ?2.300e+01 6.973 2.15e-10 *** Tot.Prod ? ? ? ? ? ? ? ? ? ? -4.893e+00 ?1.565e+02 -0.031 ? 0.9751 Herbaceous.Prod.kg.ha. ? ? ? ?4.905e+00 ?1.565e+02 0.031 ? 0.9751 Leaf.Prod ? ? ? ? ? ? ? ? ?4.842e+00 ?1.565e+02 0.031 ? 0.9754 Tree.bio ? ? ? ? ? ? ? ? ? ? -4.241e+01 ?2.771e+02 -0.153 ? 0.8786 Total_Density ? ? ? ? ? ? ? ?-1.930e+00 ?8.933e-01 -2.160 ? 0.0329 * X1st.SpecieDensity.trunk.ha. ?1.992e+00 9.246e-01 ? 2.154 0.0333 * X2nd.SpecieDensity.trunk.ha. ?3.416e+00 1.642e+00 ? 2.080 ? 0.0398 * Herb_Specie_Index1 ? ? ? ? ? -1.091e+00 ?1.844e+00 -0.592 ? 0.5552 iNDVI.JASO. ? ? ? ? ? ? ? ? ? 8.914e+02 ?6.076e+01 14.670 ?< 2e-16 *** RFE.Cum.JASO. ? ? ? ? ? ? ? ? 2.525e+00 ?4.529e-01 5.575 1.68e-07 *** --- Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 295.3 on 114 degrees of freedom Multiple R-squared: 0.9206, ? ? Adjusted R-squared: 0.9116 F-statistic: 101.7 on 13 and 114 DF, ?p-value: < 2.2e-16 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.