Apply function to every 20 rows between pairs of columns in a matrix

arun

Mon, Nov 11, 2013 7:40 PM

Hi,
May be this what you wanted.
res2 <- lapply(row.names(res[[1]]),function(x) do.call(rbind,lapply(res,function(y) y[match(x, row.names(y)),])))
?length(res2)
#[1] 48
?dim(res2[[1]])
#[1] 2325??? 8

A.K.

On Monday, November 11, 2013 10:20 PM, Yu-yu Ren <renyangsu at gmail.com> wrote:

Thank you so much for that script, it works great. One additional request; how can I go about binding each of the 2325 matrices for each sample, resulting in 48 matrices of 8 column by 2325 row?

On Mon, Nov 11, 2013 at 10:02 PM, arun <smartpink111 at yahoo.com> wrote:

Hi,
I already sent a reply to R-help.? I am not sure about the "2342".

set.seed(25)
dat1 <- as.data.frame(matrix(sample(c("A","T","G","C"),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE)
?lst1 <- split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1))))
res <- lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], function(z) sum(y==z)/20)))

?length(res)
#[1] 2325? ### check here
?dim(res[[1]])
#[1] 48? 8

A.K.




On Monday, November 11, 2013 10:00 PM, Yu-yu Ren <renyangsu at gmail.com> wrote:

Thank you, I have uploaded several example files, with intermediate outputs of what I have done and the logic flow.




On Mon, Nov 11, 2013 at 9:37 PM, <smartpink111 at yahoo.com> wrote:

Hi,

Comparing the first 8 columns separately with 9-56 columns is not clear. ?Also, please provide a reproducible example (using ?dput) for others to work on.

A.K.
<quote author='Renyulb28'>
Hi all, I have a set of genetic SNP data that looks like

Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8
Sample1 Sample2 Sample3 Sample...
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T

The size of the matrix is 56 columns by 46482 rows. I need to first bin the
matrix by every 20 rows, then compare each of the first 8 columns (founders)
to each columns 9-56, and divide the total number of matching
letters/alleles by the total number of rows (20). Ultimately I need 48 8
column by 2342 row matrices, which are essentially similarity matrices. I
have tried to extract each pair separately by something like

"length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]=="T"
& cbind(odd[,9],odd[,1])[,2]=="T",])/nrow(cbind(odd[,9],odd[,1]))"

but this is no where near efficient, and I do not know of a faster way of
applying the function to every 20 rows and across multiple pairs.

In the example given above, if the rows were all identical like shown across
20 rows, then the first row of the matrix for Sample1 would be

1 1 1 0 0 0 0

</quote>
Quoted from:
http://r.789695.n4.nabble.com/Apply-function-to-every-20-rows-between-pairs-of-columns-in-a-matrix-tp4680272.html

_____________________________________
Sent from http://r.789695.n4.nabble.com