Principal Components in a Linear Model
1. Probably not, depending on what you expect to gain from this. R's numerical procedures can almost certainly handle the correlations. 2. Search on "R package for principal components regression" instead of rolling your own.There are several (e.g. "chemometrics", "pls", etc.) -- Bert
On Fri, Nov 22, 2013 at 8:47 AM, Chris Wilkinson <kinsham at verizon.net> wrote:
My data has correlations between predictors so I think it would be advantageous to rotate the axes with prcomp().
census <-
read.table(paste("http://www.stat.wisc.edu/~rich/JWMULT02dat","T8-5.DAT",sep
="/"),header=F)
census
V1 V2 V3 V4 V5 1 5.935 14.2 2.265 2.27 2.91 2 1.523 13.1 0.597 0.75 2.62 3 2.599 12.7 1.237 1.11 1.72 4 4.009 15.2 1.649 0.81 3.02 5 4.687 14.7 2.312 2.50 2.22 6 8.044 15.6 3.641 4.51 2.36 7 2.766 13.3 1.244 1.03 1.97 8 6.538 17.0 2.618 2.39 1.85 9 6.451 12.9 3.147 5.52 2.01 10 3.314 12.2 1.606 2.18 1.82 11 3.777 13.0 2.119 2.83 1.80 12 1.530 13.8 0.798 0.84 4.25 13 2.768 13.6 1.336 1.75 2.64 14 6.585 14.9 2.763 1.91 3.17
pca1 <- prcomp(census) summary(pca1)
Importance of components:
PC1 PC2 PC3 PC4 PC5
Standard deviation 2.6327 1.3361 0.62422 0.47909 0.11897
Proportion of Variance 0.7413 0.1909 0.04168 0.02455 0.00151
Cumulative Proportion 0.7413 0.9323 0.97394 0.99849 1.00000
pca1$rotation # eigenvectors
PC1 PC2 PC3 PC4 PC5 V1 -0.78120807 0.07087183 -0.003656607 0.54171007 0.302039670 V2 -0.30564856 0.76387277 0.161817438 -0.54479937 0.009279632 V3 -0.33444840 -0.08290788 -0.014841008 0.05101636 -0.937255367 V4 -0.42600795 -0.57945799 -0.220453468 -0.63601254 0.172145212 V5 0.05435431 0.26235528 -0.961759720 0.05127599 -0.024583093 I'd like to create a linear model based on the rotated axes.
linmod <- lm(y~a+b+....)
Could someone be kind enough to suggest how to code a, b...? Chris
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374