using 'apply' to apply princomp to an array of datasets

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121212/ca84c79c/attachment.pl>
Sorry, I just realized I didn't send the message below in plain text.
-David Romano
Hi everyone,

Suppose I have a 3D array of datasets, where say dimension 1 corresponds
to cases, dimension 2 to datasets, and dimension 3 to observations within a
dataset.  As an example, suppose I do the following:

x <- sample(1:20, 48, replace=TRUE)
datasets <- array(x, dim=c(4,3,2))
Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single
data matrix with four cases and two observations.  Now, I'd like to be able
to do the following: apply pca to each dataset, and create a matrix of the
first principal component scores.

In this example, I could do:

pcl<-apply(datasets,2,princomp)
which yields a list of princomp output, one for each dataset, so that the
vector of first principal component scores for dataset 1 is obtained by

score1set1 <- pcl[[1]]$scores[,1]
and I could then obtain the desired matrix by

score1matrix <- cbind( score1set1, score1set2, score1set3)

So my first question is: 1) how could I use *apply to do this?  I'm having
trouble because pcl is a list of lists, so I can't use, say, do.call(cbind,
...) without first having a list of the first component score vectors, which
I'm not sure how to produce.

My second question is: 2) Having answered question 1), now suppose there
may be datasets containing NA value -- how could I select the subset of
values from dimension 2 corresponding to the datasets for which this is true
(again using *apply?)?

Thanks in advance for any light you might be able to shed on these
questions!

David Romano
Hello,

As for the first question try

scoreset <- lapply(pcl, function(x) x$scores[, 1])
do.call(cbind, scoreset)

As for the second question, you want to know which columns in 'datasets' 
have NA's?

colidx <- apply(datasets, 2, function(x) any(is.na(x)))
datasets[, colidx]  # These have NA's

For the column numbers you can do

colnums <- which(colidx)

Hope this helps,

Rui Barradas

Em 12-12-2012 17:14, David Romano escreveu:
Hi everyone,

Suppose I have a 3D array of datasets, where say dimension 1 corresponds to
cases, dimension 2 to datasets, and dimension 3 to observations within a
dataset.  As an example, suppose I do the following:

x <- sample(1:20, 48, replace=TRUE)
datasets <- array(x, dim=c(4,3,2))
Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single data
matrix with four cases and two observations.  Now, I'd like to be able to
do the following: apply pca to each dataset, and create a matrix of the
first principal component scores.

In this example, I could do:

pcl<-apply(datasets,2,princomp)
which yields a list of princomp output, one for each dataset, so that the
vector of first principal component scores for dataset 1 is obtained by

score1set1 <- pcl[[1]]$scores[,1]
and I could then obtain the desired matrix by

score1matrix <- cbind( score1set1, score1set2, score1set3)
So my first question is: 1) how could I use *apply to do this?  I'm having
trouble because pcl is a list of lists, so I can't use, say, do.call(cbind,
...) without first having a list of the first component score vectors,
which I'm not sure how to produce.

My second question is: 2) Having answered question 1), now suppose there
may be datasets containing NA value -- how could I select the subset of
values from dimension 2 corresponding to the datasets for which this is
true (again using *apply?)?

Thanks in advance for any light you might be able to shed on these
questions!

David Romano

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thank you, Rui!   This is incredibly helpful -- anonymous functions
are new to me, and I appreciate being shown how useful they are.

Best regards,
David
Hello,

As for the first question try

scoreset <- lapply(pcl, function(x) x$scores[, 1])
do.call(cbind, scoreset)

As for the second question, you want to know which columns in 'datasets'
have NA's?

colidx <- apply(datasets, 2, function(x) any(is.na(x)))
datasets[, colidx]  # These have NA's

For the column numbers you can do

colnums <- which(colidx)

Hope this helps,

Rui Barradas

Em 12-12-2012 17:14, David Romano escreveu:
Hi everyone,

Suppose I have a 3D array of datasets, where say dimension 1 corresponds
to
cases, dimension 2 to datasets, and dimension 3 to observations within a
dataset.  As an example, suppose I do the following:

x <- sample(1:20, 48, replace=TRUE)
datasets <- array(x, dim=c(4,3,2))
Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single
data
matrix with four cases and two observations.  Now, I'd like to be able to
do the following: apply pca to each dataset, and create a matrix of the
first principal component scores.

In this example, I could do:

pcl<-apply(datasets,2,princomp)
which yields a list of princomp output, one for each dataset, so that the
vector of first principal component scores for dataset 1 is obtained by

score1set1 <- pcl[[1]]$scores[,1]
and I could then obtain the desired matrix by

score1matrix <- cbind( score1set1, score1set2, score1set3)

So my first question is: 1) how could I use *apply to do this?  I'm having
trouble because pcl is a list of lists, so I can't use, say,
do.call(cbind,
...) without first having a list of the first component score vectors,
which I'm not sure how to produce.

My second question is: 2) Having answered question 1), now suppose there
may be datasets containing NA value -- how could I select the subset of
values from dimension 2 corresponding to the datasets for which this is
true (again using *apply?)?

Thanks in advance for any light you might be able to shed on these
questions!

David Romano

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.