Mental Block with PCA of multivariate time series!
Laura Quinn wrote:
Sorry, I don't think I made myself clear enough with my initial query! I am wishing to investigate the temporal evolution of the pca: if we assume that every 50 rows of my data frame is representitive of, for instance, 1 day of data, I am hoping to automate a process whereby a pca is performed on every 50 rows of data and the loading for PC1 and PC2 for each variable (i.e. each column) is represented as a point on a plot - so a years' data will be represented as two lines (representing PC1 and PC2) on a time series plot for each variable. Laura Quinn Institute of Atmospheric Science School of Earth and Environment University of Leeds Leeds LS2 9JT tel: +44 113 343 1596 fax: +44 113 343 6716 mail: laura at env.leeds.ac.uk On Mon, 16 May 2005, Gavin Simpson wrote:
Laura Quinn wrote:
Please could someone point me in the right direction as I appear to be having a total mental block with fairly basic PCA problem! I have a large dataframe where rows represent independent observations and columns are variables. I am wanting to perform PCA sequentially on blocks of nrows at a time and produce a graphical output of the loadings for the first 2 EOFs for each variable. I'm sure I've performed a very similar routine in the past, but the method is currently escaping me. Any help gratefully received!
Hi Laura, data(iris) iris.dat <- iris[,1:4] pca.1 <- prcomp(iris.dat[1:50, ], scale = TRUE) pca.2 <- prcomp(iris.dat[51:100, ], scale = TRUE) pca.3 <- prcomp(iris.dat[100:150, ], scale = TRUE) biplot(pca.1) etc... There is a better way of subsetting this data set as the 5th col of iris is a factor and we could use the subset argument to prcomp to do the subsetting without having to know that there are 50 rows per species. Take a look at that argument if you have a variable that defines the blocks for you. Is this what you were after? All the best, Gav -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Hi Laura,
Sorry for not quite understanding the specifics, does this do what you want?
# generate some random data for this example
dat <- data.frame(var1 = rnorm(1:1000), var2 = runif(1:1000), var3 =
rnorm(1:1000) + runif(1:1000), var4 = as.factor(rep(1:10, rep(100, 10))))
# create a list of pca loadings on axis 1, 2
temp <- by(dat[,1:3], dat$var4, function(x) prcomp(x, scale =
TRUE)$rotation[,1:2])
# plot it
matplot(t(sapply(temp, function(x) x[,1])), type = "n")
# add the lines
matlines(t(sapply(temp, function(x) x[,1])), lty = "solid", col =
c("red", "blue", "green"))
matlines(t(sapply(temp, function(x) x[,2])), lty = "dotted", col =
c("red", "blue", "green"))
It isn't pretty - you'll need to calculate the x/ylims for the matplot
call, but if it is want you are after the plotting should be fairly easy
thing to work out.
HTH
G
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%