Skip to content

Mental Block with PCA of multivariate time series!

6 messages · Laura Quinn, Gavin Simpson, Pierre BADY +1 more

#
Please could someone point me in the right direction as I appear to be
having a total mental block with fairly basic PCA problem!

I have a large dataframe where rows represent independent
observations and columns are variables. I am wanting to perform PCA
sequentially on blocks of nrows at a time and produce a graphical output
of the loadings for the first 2 EOFs for each variable.

I'm sure I've performed a very similar routine in the past, but the method
is currently escaping me.

Any help gratefully received!

Laura Quinn
Institute of Atmospheric Science
School of Earth and Environment
University of Leeds
Leeds
LS2 9JT

tel: +44 113 343 1596
fax: +44 113 343 6716
mail: laura at env.leeds.ac.uk
#
Laura Quinn wrote:
Hi Laura,

data(iris)
iris.dat <- iris[,1:4]
pca.1 <- prcomp(iris.dat[1:50, ], scale = TRUE)
pca.2 <- prcomp(iris.dat[51:100, ], scale = TRUE)
pca.3 <- prcomp(iris.dat[100:150, ], scale = TRUE)

biplot(pca.1)
etc...

There is a better way of subsetting this data set as the 5th col of iris 
is a factor and we could use the subset argument to prcomp to do the 
subsetting without having to know that there are 50 rows per species. 
Take a look at that argument if you have a variable that defines the 
blocks for you.

Is this what you were after?

All the best,

Gav
#
Sorry, I don't think I made myself clear enough with my initial query!

I am wishing to investigate the temporal evolution of the pca: if we
assume that every 50 rows of my data frame is representitive of, for
instance, 1 day of data, I am hoping to automate a process whereby a pca
is performed on every 50 rows of data and the loading for PC1 and PC2 for
each variable (i.e. each column) is represented as a point on a plot - so
a years' data will be represented as two lines (representing PC1 and PC2)
on a time series plot for each variable.



Laura Quinn
Institute of Atmospheric Science
School of Earth and Environment
University of Leeds
Leeds
LS2 9JT

tel: +44 113 343 1596
fax: +44 113 343 6716
mail: laura at env.leeds.ac.uk
On Mon, 16 May 2005, Gavin Simpson wrote:

            
#
Laura Quinn wrote:
Hi Laura,

Sorry for not quite understanding the specifics, does this do what you want?

# generate some random data for this example
dat <- data.frame(var1 = rnorm(1:1000), var2 = runif(1:1000), var3 = 
rnorm(1:1000) + runif(1:1000), var4 = as.factor(rep(1:10, rep(100, 10))))
# create a list of pca loadings on axis 1, 2
temp <- by(dat[,1:3], dat$var4, function(x) prcomp(x, scale =
	TRUE)$rotation[,1:2])
# plot it
matplot(t(sapply(temp, function(x) x[,1])), type = "n")
# add the lines
matlines(t(sapply(temp, function(x) x[,1])), lty = "solid", col = 
c("red", "blue", "green"))
matlines(t(sapply(temp, function(x) x[,2])), lty = "dotted", col = 
c("red", "blue", "green"))

It isn't pretty - you'll need to calculate the x/ylims for the matplot 
call, but if it is want you are after the plotting should be fairly easy 
thing to work out.

HTH

G
#
Is this along the lines of what you are trying to do?

sim.data<-data.frame(matrix(rnorm(350*10),350,10))
day<-seq(1:350)
sim.data<-data.frame(day,sim.data)
pc1.load.all<-NULL
pc2.load.all<-NULL
 for (i in seq(0,300,by=50)){
 sim.data.i<-subset(sim.data,sim.data$day>i&sim.data$day<(i+50))
pc1.load.i<-princomp(sim.data.i[,2:11])$loadings[,1]
pc2.load.i<-princomp(sim.data.i[,2:11])$loadings[,2]
pc1.load.all<-rbind(pc1.load.all,pc1.load.i)
pc2.load.all<-rbind(pc1.load.all,pc1.load.i)
 }

period<-seq(1:7)
pc1.load.all<-cbind(period,pc1.load.all)
pc2.load.all<-cbind(period,pc1.load.all)

# and plot loadings for each each variable vs. the period...

Ignacio


-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Laura Quinn
Sent: Monday, May 16, 2005 4:34 AM
To: Gavin Simpson
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Mental Block with PCA of multivariate time series!

Sorry, I don't think I made myself clear enough with my initial query!

I am wishing to investigate the temporal evolution of the pca: if we
assume that every 50 rows of my data frame is representitive of, for
instance, 1 day of data, I am hoping to automate a process whereby a pca
is performed on every 50 rows of data and the loading for PC1 and PC2 for
each variable (i.e. each column) is represented as a point on a plot - so
a years' data will be represented as two lines (representing PC1 and PC2)
on a time series plot for each variable.



Laura Quinn
Institute of Atmospheric Science
School of Earth and Environment
University of Leeds
Leeds
LS2 9JT

tel: +44 113 343 1596
fax: +44 113 343 6716
mail: laura at env.leeds.ac.uk
On Mon, 16 May 2005, Gavin Simpson wrote:

            
method
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html