integrating 2 lists and a data frame in R
Simple matrix indexing suffices without any fancier functionality. ## First convert M and N to character vectors -- which they should have been in the first place! M <- sort(as.character(M[,1])) N <- sort(as.character(N[,1])) ## This could be a one-liner, but I'll split it up for clarity. res <-matrix(NA, length(M),length(N),dimnames = list(M,N)) res[as.matrix(C[,2:1])] <- C$I ## matrix indexing res Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Jun 6, 2017 at 7:46 AM, Bogdan Tanasa <tanasa at gmail.com> wrote:
Thank you David. Using xtabs operation simplifies the code very much, many thanks ;) On Tue, Jun 6, 2017 at 7:44 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Jun 6, 2017, at 4:01 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
Hi Bogdan,
Kinda messy, but:
N <- data.frame(N=c("n1","n2","n3","n4"))
M <- data.frame(M=c("m1","m2","m3","m4","m5"))
C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"),
I=c(100,300,400))
MN<-as.data.frame(matrix(NA,nrow=length(N[,1]),ncol=length(M[,1]))) names(MN)<-M[,1] rownames(MN)<-N[,1] C[,1]<-as.character(C[,1]) C[,2]<-as.character(C[,2]) for(row in 1:dim(C)[1]) MN[C[row,1],C[row,2]]<-C[row,3]
`xtabs` offers another route: C$m <- factor(C$m, levels=M$M) C$n <- factor(C$n, levels=N$N) Option 1: Zeroes in the empty positions:
(X <- xtabs(I ~ m+n , C, addNA=TRUE))
n m n1 n2 n3 n4 m1 100 300 0 0 m2 0 0 0 0 m3 0 0 400 0 m4 0 0 0 0 m5 0 0 0 0 Option 2: Sparase matrix
(X <- xtabs(I ~ m+n , C, sparse=TRUE))
5 x 4 sparse Matrix of class "dgCMatrix"
n
m n1 n2 n3 n4
m1 100 300 . .
m2 . . . .
m3 . . 400 .
m4 . . . .
m5 . . . .
I wasn't sure if the sparse reuslts of xtabs would make a distinction
between 0 and NA, but happily it does:
C <- data.frame(n=c("n1","n2","n3", "n3", "n4"), m=c("m1","m1","m3",
"m4", "m5"), I=c(100,300,400, NA, 0))
C
n m I 1 n1 m1 100 2 n2 m1 300 3 n3 m3 400 4 n3 m4 NA 5 n4 m5 0
(X <- xtabs(I ~ m+n , C, sparse=TRUE))
4 x 4 sparse Matrix of class "dgCMatrix"
n
m n1 n2 n3 n4
m1 100 300 . .
m3 . . 400 .
m4 . . . .
m5 . . . 0
(In the example I forgot to repeat the lines that augmented the factor
levels so m2 is not seen.
--
Davod
Jim On Tue, Jun 6, 2017 at 3:51 PM, Bogdan Tanasa <tanasa at gmail.com> wrote:
Dear Bert, thank you for your response. here it is the piece of R code : given 3
data
frames below ---
N <- data.frame(N=c("n1","n2","n3","n4"))
M <- data.frame(M=c("m1","m2","m3","m4","m5"))
C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"),
I=c(100,300,400))
how shall I integrate N, and M, and C in such a way that at the end we
have
a data frame with :
- list N as the columns names
- list M as the rows names
- the values in the cells of N * M, corresponding to the numerical
values in the data frame C.
more precisely, the result shall be :
n1 n2 n3 n4
m1 100 200 - -
m2 - - - -
m3 - - 300 -
m4 - - - -
m5 - - - -
thank you !
On Mon, Jun 5, 2017 at 6:57 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
Reproducible example, please. -- In particular, what exactly does C
look
ilike? (You should know this by now). -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 5, 2017 at 6:45 PM, Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear all, please could you advise on the R code I could use in order to do the following operation : a. -- I have 2 lists of "genome coordinates" : a list is composed by numbers that represent genome coordinates; let's say list N : n1 n2 n3 n4 and a list M: m1 m2 m3 m4 m5 2 -- and a data frame C, where for some pairs of coordinates (n,m)
from
the
lists above, we have a numerical intensity; for example : n1; m1; 100 n1; m2; 300 The question would be : what is the most efficient R code I could use
in
order to integrate the list N, the list M, and the data frame C, in
order
to obtain a DATA FRAME, -- list N as the columns names -- list M as the rows names -- the values in the cells of N * M, corresponding to the numerical
values
in the data frame C.
A little example would be :
n1 n2 n3 n4
m1 100 - - -
m2 300 - - -
m3 - - - -
m4 - - - -
m5 - - - -
I wrote a script in perl, although i would like to do this in R
Many thanks ;)
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.