How to plot PCA output?

Tue, May 8, 2012 6:02 AM
Steve is probably looking for answers from others, but if the variables are relatively few, I plot the loadings vs variables for each of the first few PCs, using something like a bar plot which can go positive or negative (not the best ink to data ratio however).  So PC1 loadings vs variable names, PC2 loadings vs variable names etc.

If there are a lot of variables, I use a dot rather than a bar, or more generally, a line (for instance, spectroscopic data where there are PC1 loadings vs thousands of frequencies).

The magnitude and sign of the loading for each variable gives you a sense of the contribution of that variable to the given PC.

I suspect this is not what Steve had in mind (he no doubt knows these things well already) but I'm also always on the lookout for good displays.  Share 'em if you got 'em.

Bryan

i.pca <- prcomp(iris[,1:4])
library("ggplot2")

# plot scores
scores <- as.data.frame(i.pca$x)
qplot(x = PC1, y = PC2, data = scores, geom = "point", col = iris[,5])

# Loadings on PC1 (few variables)

loadings <- as.data.frame(i.pca$rotation)
loadings$var <- colnames(iris[,1:4])
qplot(x = var, y = PC1, data = loadings, geom = "bar")

# Could also use geom = "point" but when there are many variables you may wish to connect the points too.
# Compare to
biplot(i.pca)

And you can see the biplot has some additional information compared to the simple loading plot, but I'd have to dig out exactly what it is and if it is especially useful.
How to plot PCA output?

Thread (14 messages)