Please could someone point me in the right direction as I appear to be having a total mental block with fairly basic PCA problem! I have a large dataframe where rows represent independent observations and columns are variables. I am wanting to perform PCA sequentially on blocks of nrows at a time and produce a graphical output of the loadings for the first 2 EOFs for each variable. I'm sure I've performed a very similar routine in the past, but the method is currently escaping me. Any help gratefully received! Laura Quinn Institute of Atmospheric Science School of Earth and Environment University of Leeds Leeds LS2 9JT tel: +44 113 343 1596 fax: +44 113 343 6716 mail: laura at env.leeds.ac.uk
Mental Block with PCA of multivariate time series!
6 messages · Laura Quinn, Gavin Simpson, Pierre BADY +1 more
Laura Quinn wrote:
Please could someone point me in the right direction as I appear to be having a total mental block with fairly basic PCA problem! I have a large dataframe where rows represent independent observations and columns are variables. I am wanting to perform PCA sequentially on blocks of nrows at a time and produce a graphical output of the loadings for the first 2 EOFs for each variable. I'm sure I've performed a very similar routine in the past, but the method is currently escaping me. Any help gratefully received!
Hi Laura, data(iris) iris.dat <- iris[,1:4] pca.1 <- prcomp(iris.dat[1:50, ], scale = TRUE) pca.2 <- prcomp(iris.dat[51:100, ], scale = TRUE) pca.3 <- prcomp(iris.dat[100:150, ], scale = TRUE) biplot(pca.1) etc... There is a better way of subsetting this data set as the 5th col of iris is a factor and we could use the subset argument to prcomp to do the subsetting without having to know that there are 50 rows per species. Take a look at that argument if you have a variable that defines the blocks for you. Is this what you were after? All the best, Gav
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Sorry, I don't think I made myself clear enough with my initial query! I am wishing to investigate the temporal evolution of the pca: if we assume that every 50 rows of my data frame is representitive of, for instance, 1 day of data, I am hoping to automate a process whereby a pca is performed on every 50 rows of data and the loading for PC1 and PC2 for each variable (i.e. each column) is represented as a point on a plot - so a years' data will be represented as two lines (representing PC1 and PC2) on a time series plot for each variable. Laura Quinn Institute of Atmospheric Science School of Earth and Environment University of Leeds Leeds LS2 9JT tel: +44 113 343 1596 fax: +44 113 343 6716 mail: laura at env.leeds.ac.uk
On Mon, 16 May 2005, Gavin Simpson wrote:
Laura Quinn wrote:
Please could someone point me in the right direction as I appear to be having a total mental block with fairly basic PCA problem! I have a large dataframe where rows represent independent observations and columns are variables. I am wanting to perform PCA sequentially on blocks of nrows at a time and produce a graphical output of the loadings for the first 2 EOFs for each variable. I'm sure I've performed a very similar routine in the past, but the method is currently escaping me. Any help gratefully received!
Hi Laura, data(iris) iris.dat <- iris[,1:4] pca.1 <- prcomp(iris.dat[1:50, ], scale = TRUE) pca.2 <- prcomp(iris.dat[51:100, ], scale = TRUE) pca.3 <- prcomp(iris.dat[100:150, ], scale = TRUE) biplot(pca.1) etc... There is a better way of subsetting this data set as the 5th col of iris is a factor and we could use the subset argument to prcomp to do the subsetting without having to know that there are 50 rows per species. Take a look at that argument if you have a variable that defines the blocks for you. Is this what you were after? All the best, Gav -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Laura Quinn wrote:
Sorry, I don't think I made myself clear enough with my initial query! I am wishing to investigate the temporal evolution of the pca: if we assume that every 50 rows of my data frame is representitive of, for instance, 1 day of data, I am hoping to automate a process whereby a pca is performed on every 50 rows of data and the loading for PC1 and PC2 for each variable (i.e. each column) is represented as a point on a plot - so a years' data will be represented as two lines (representing PC1 and PC2) on a time series plot for each variable. Laura Quinn Institute of Atmospheric Science School of Earth and Environment University of Leeds Leeds LS2 9JT tel: +44 113 343 1596 fax: +44 113 343 6716 mail: laura at env.leeds.ac.uk On Mon, 16 May 2005, Gavin Simpson wrote:
Laura Quinn wrote:
Please could someone point me in the right direction as I appear to be having a total mental block with fairly basic PCA problem! I have a large dataframe where rows represent independent observations and columns are variables. I am wanting to perform PCA sequentially on blocks of nrows at a time and produce a graphical output of the loadings for the first 2 EOFs for each variable. I'm sure I've performed a very similar routine in the past, but the method is currently escaping me. Any help gratefully received!
Hi Laura, data(iris) iris.dat <- iris[,1:4] pca.1 <- prcomp(iris.dat[1:50, ], scale = TRUE) pca.2 <- prcomp(iris.dat[51:100, ], scale = TRUE) pca.3 <- prcomp(iris.dat[100:150, ], scale = TRUE) biplot(pca.1) etc... There is a better way of subsetting this data set as the 5th col of iris is a factor and we could use the subset argument to prcomp to do the subsetting without having to know that there are 50 rows per species. Take a look at that argument if you have a variable that defines the blocks for you. Is this what you were after? All the best, Gav -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Hi Laura,
Sorry for not quite understanding the specifics, does this do what you want?
# generate some random data for this example
dat <- data.frame(var1 = rnorm(1:1000), var2 = runif(1:1000), var3 =
rnorm(1:1000) + runif(1:1000), var4 = as.factor(rep(1:10, rep(100, 10))))
# create a list of pca loadings on axis 1, 2
temp <- by(dat[,1:3], dat$var4, function(x) prcomp(x, scale =
TRUE)$rotation[,1:2])
# plot it
matplot(t(sapply(temp, function(x) x[,1])), type = "n")
# add the lines
matlines(t(sapply(temp, function(x) x[,1])), lty = "solid", col =
c("red", "blue", "green"))
matlines(t(sapply(temp, function(x) x[,2])), lty = "dotted", col =
c("red", "blue", "green"))
It isn't pretty - you'll need to calculate the x/ylims for the matplot
call, but if it is want you are after the plotting should be fairly easy
thing to work out.
HTH
G
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050516/bc556740/attachment.pl
Is this along the lines of what you are trying to do?
sim.data<-data.frame(matrix(rnorm(350*10),350,10))
day<-seq(1:350)
sim.data<-data.frame(day,sim.data)
pc1.load.all<-NULL
pc2.load.all<-NULL
for (i in seq(0,300,by=50)){
sim.data.i<-subset(sim.data,sim.data$day>i&sim.data$day<(i+50))
pc1.load.i<-princomp(sim.data.i[,2:11])$loadings[,1]
pc2.load.i<-princomp(sim.data.i[,2:11])$loadings[,2]
pc1.load.all<-rbind(pc1.load.all,pc1.load.i)
pc2.load.all<-rbind(pc1.load.all,pc1.load.i)
}
period<-seq(1:7)
pc1.load.all<-cbind(period,pc1.load.all)
pc2.load.all<-cbind(period,pc1.load.all)
# and plot loadings for each each variable vs. the period...
Ignacio
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Laura Quinn
Sent: Monday, May 16, 2005 4:34 AM
To: Gavin Simpson
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Mental Block with PCA of multivariate time series!
Sorry, I don't think I made myself clear enough with my initial query!
I am wishing to investigate the temporal evolution of the pca: if we
assume that every 50 rows of my data frame is representitive of, for
instance, 1 day of data, I am hoping to automate a process whereby a pca
is performed on every 50 rows of data and the loading for PC1 and PC2 for
each variable (i.e. each column) is represented as a point on a plot - so
a years' data will be represented as two lines (representing PC1 and PC2)
on a time series plot for each variable.
Laura Quinn
Institute of Atmospheric Science
School of Earth and Environment
University of Leeds
Leeds
LS2 9JT
tel: +44 113 343 1596
fax: +44 113 343 6716
mail: laura at env.leeds.ac.uk
On Mon, 16 May 2005, Gavin Simpson wrote:
Laura Quinn wrote:
Please could someone point me in the right direction as I appear to be having a total mental block with fairly basic PCA problem! I have a large dataframe where rows represent independent observations and columns are variables. I am wanting to perform PCA sequentially on blocks of nrows at a time and produce a graphical output of the loadings for the first 2 EOFs for each variable. I'm sure I've performed a very similar routine in the past, but the
method
is currently escaping me. Any help gratefully received!
Hi Laura, data(iris) iris.dat <- iris[,1:4] pca.1 <- prcomp(iris.dat[1:50, ], scale = TRUE) pca.2 <- prcomp(iris.dat[51:100, ], scale = TRUE) pca.3 <- prcomp(iris.dat[100:150, ], scale = TRUE) biplot(pca.1) etc... There is a better way of subsetting this data set as the 5th col of iris is a factor and we could use the subset argument to prcomp to do the subsetting without having to know that there are 50 rows per species. Take a look at that argument if you have a variable that defines the blocks for you. Is this what you were after? All the best, Gav -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html