Regarding Principal Component Analysis result Interpretation
First, see the example at https://isezen.github.io/PCA/
On 15 Sep 2017, at 13:43, Shylashree U.R <shylashivashree at gmail.com> wrote:
Dear Sir/Madam,
I am trying to do PCA analysis with "iris" dataset and trying to interpret
the result. Dataset contains 150 obs of 5 variables
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4
0.2 setosa
2 4.9 3.0 1.4
0.2 setosa
.....
.....
150 5.9 3.0 5.1 18
verginica
now I used 'prcomp' function on dataset and got result as following:
print(pc)
Standard deviations (1, .., p=4):
[1] 1.7083611 0.9560494 0.3830886 0.1439265
Rotation (n x k) = (4 x 4):
PC1 PC2 PC3 PC4
Sepal.Length 0.5210659 -0.37741762 0.7195664 0.2612863
Sepal.Width -0.2693474 -0.92329566 -0.2443818 -0.1235096
Petal.Length 0.5804131 -0.02449161 -0.1421264 -0.8014492
Petal.Width 0.5648565 -0.06694199 -0.6342727 0.5235971
I'm planning to use PCA as feature selection process and remove variables
which are corelated in my project, I have interpreted the PCA result, but
not sure is my interpretation is correct or wrong.
You want to ?remove variables which are correlated?. Correlated among themselves? If so, why don?t you create a pearson correlation matrix (see ?cor) and define a threshold and remove variables which are correlated according to this threshold? Perhaps I did not understand you correctly, excuse me. for iris dataset, each component will be as much as correlated with PC1 and remaining part will be correlated PC2 and so on. Hence, you can identify which variables are similar in terms of VARIANCE. You can understand it if you examine the example that I gave above. In PCA, you can also calculate the correlations between variables and PCs but this shows you how PCs are affected by this variables. I don?t know how you plan to accomplish feature selection process so I hope this helps you. Also note that resources part at the end of example. isezen