Resource selection- correlation between variables
Teresa: There probably are no simple short cuts here - you need to investigate the correlations structure for each of your possible comparisons. You can use the variance inflation factor function vif() in the car package for glms, which includes an extension for categorical predictors. I recommend the vif over pairwise correlations as it is the linear correlation among multiple predictors that creates issues. Note that really large VIFs (e.g., >10 or so) are likely to indicate instability with standard errors for regression coefficient estimates. Smaller VIFs 1-5 largely indicate an issue with how to interpret regression coefficients as partial effects. VIFs close to 1 indicate no linear correlation. You don't necessarily need to eliminate predictor variables from a model just because there is some multicollinearity, e.g., VIFs in the range 1-5. You just need to understand that the interpretation of the regression coefficient as a partial effect for a unit change in the predictor variable really needs to be interpreted as a unit change in the part of the predictor that is not linearly related to the other predictors (see Cade 2015. Ecology 96:2370-2382). This, of course, is why it is so wonderful to have perfectly uncorrelated predictors - the interpretation of partial effects is simpler. But that is not realistic for most resource selection analyses. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Sat, Jun 4, 2016 at 9:54 AM, Teresa Oliveira <mteresaoliveira92 at gmail.com
wrote:
Dear all,
I have a doubt regarding correlation between variables, and I would like to
hear your opinion on this.
Background:
I am working with telemetry data of a single species, collected by several
researchers, from five study areas. I aim to analyse resource selection
(with resource selection functions, RSF), applying Design II (individual
locations (used resource units) against study area (available resource
units)) and Design III (individual locations (used resource units) against
home range area (available resource units)). I have 13 variables in total:
10 binary variables (land cover characteristics) and 3 continuous variables
(roughness, distance to water and distance to human settlements).
I want to construct models for each study area and also a global model
including all five study areas, because I want to see if it is possible to
apply a global model for all areas or if they are very different from each
other. I'm planning to use a sampling with replacement method to understand
the effect of each area on the global model.
Question:
Before starting with RSF, I want to check if my variables are independent,
and I'm not quite sure how I am supposed to conduct the analyses. Should I
use all data (from all individuals in all study areas) to test correlation
between all variables? Or should I conduct different analyses for each
study area (or even each individual)?
Does anyone have any suggestions?
Thank you very much in advance,
Teresa
[[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology