Hi all,
I got different results constructing a SummarizedExperiment in 3.6 and 3.7. My question is, whether this is intentional or a bug.
library(GenomicRanges)
library(SummarizedExperiment)
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- LETTERS[1:6]
rownames(counts) <- 1:nrows
counts2 <- counts-floor(counts)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se <- SummarizedExperiment(assays=list(counts=counts),
rowRanges=rowRanges,
colData=colData)
str(assays(se)$counts)
assays(se)$counts2 <- as.data.frame(counts2)
str(assays(se)$counts)
On a Windows 10 R3.4.2 Bioc 3.6 this produces:
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
On Ubuntu 17.10 R-devel r73779 Bioc3.7 this produces
num [1:200, 1:6] 8636 7040 9275 4821 2475 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:1200] 8636 7040 9275 4821 2475 ...
Somehow the structure is lost.
This happens, if I mix matrix and data.frame data, and doesn?t, if I use only matrices. The man page defines matrix-like objects,
which a data.frame is (isn?t it?) and the behavior is different from Bioc3.6 to Bioc3.7.
I can rule out that this is a Windows/Linux thing, because the Travis build error, which pointed to an difference in the first place,
didn?t occur with bioc-release, just with bioc-devel.
Thanks for any advice and suggestions.
Felix
[Bioc-devel] SummarizedExperiment: structure loss, when mixing matrix and data.frame data
5 messages · Felix Ernst, Vincent Carey, Martin Morgan +1 more
Confirmed with the following sessionInfo(), satisfying biocValid()==TRUE
sessionInfo()
R Under development (unstable) (2017-11-22 r73776) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 18.1 Matrix products: default BLAS: /home/stvjc/R-35-dist/lib/R/lib/libRblas.so LAPACK: /home/stvjc/R-35-dist/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] SummarizedExperiment_1.9.2 DelayedArray_0.5.5 [3] matrixStats_0.52.2 Biobase_2.39.0 [5] GenomicRanges_1.31.1 GenomeInfoDb_1.15.1 [7] IRanges_2.13.4 S4Vectors_0.17.10 [9] BiocGenerics_0.25.0 loaded via a namespace (and not attached): [1] lattice_0.20-35 bitops_1.0-6 grid_3.5.0 [4] zlibbioc_1.25.0 XVector_0.19.1 Matrix_1.2-12 [7] tools_3.5.0 RCurl_1.95-4.8 compiler_3.5.0 [10] GenomeInfoDbData_0.99.2
On Sun, Nov 26, 2017 at 7:09 AM, Felix Ernst <felix.ernst at ulb.ac.be> wrote:
Hi all,
I got different results constructing a SummarizedExperiment in 3.6 and
3.7. My question is, whether this is intentional or a bug.
library(GenomicRanges)
library(SummarizedExperiment)
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- LETTERS[1:6]
rownames(counts) <- 1:nrows
counts2 <- counts-floor(counts)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se <- SummarizedExperiment(assays=list(counts=counts),
rowRanges=rowRanges,
colData=colData)
str(assays(se)$counts)
assays(se)$counts2 <- as.data.frame(counts2)
str(assays(se)$counts)
On a Windows 10 R3.4.2 Bioc 3.6 this produces:
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
On Ubuntu 17.10 R-devel r73779 Bioc3.7 this produces
num [1:200, 1:6] 8636 7040 9275 4821 2475 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:1200] 8636 7040 9275 4821 2475 ...
Somehow the structure is lost.
This happens, if I mix matrix and data.frame data, and doesn?t, if I use
only matrices. The man page defines matrix-like objects,
which a data.frame is (isn?t it?) and the behavior is different from
Bioc3.6 to Bioc3.7.
I can rule out that this is a Windows/Linux thing, because the Travis
build error, which pointed to an difference in the first place,
didn?t occur with bioc-release, just with bioc-devel.
Thanks for any advice and suggestions.
Felix
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
It would seem to be a bug in endoapply
lst <- SimpleList(
m = matrix(0, 2, 2, dimnames=list(letters[1:2], LETTERS[1:2])),
df = data.frame(A=1:2, B=1:2, row.names=letters[1:2])
)
dimnames(lst[[1]]) # list(c("a", "b"), c("A", "B"))
dimnames(endoapply(lst, identity)[[1]]) # NULL
specifically S4Vectors:::coerceToSimpleList
lst <- list(
m = matrix(0, 2, 2, dimnames=list(letters[1:2], LETTERS[1:2])),
df = data.frame(A=1:2, B=1:2, row.names=letters[1:2])
)
S4Vectors:::coerceToSimpleList(lst)
Martin
On 11/26/2017 07:56 AM, Vincent Carey wrote:
Confirmed with the following sessionInfo(), satisfying biocValid()==TRUE
sessionInfo()
R Under development (unstable) (2017-11-22 r73776) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 18.1 Matrix products: default BLAS: /home/stvjc/R-35-dist/lib/R/lib/libRblas.so LAPACK: /home/stvjc/R-35-dist/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] SummarizedExperiment_1.9.2 DelayedArray_0.5.5 [3] matrixStats_0.52.2 Biobase_2.39.0 [5] GenomicRanges_1.31.1 GenomeInfoDb_1.15.1 [7] IRanges_2.13.4 S4Vectors_0.17.10 [9] BiocGenerics_0.25.0 loaded via a namespace (and not attached): [1] lattice_0.20-35 bitops_1.0-6 grid_3.5.0 [4] zlibbioc_1.25.0 XVector_0.19.1 Matrix_1.2-12 [7] tools_3.5.0 RCurl_1.95-4.8 compiler_3.5.0 [10] GenomeInfoDbData_0.99.2 On Sun, Nov 26, 2017 at 7:09 AM, Felix Ernst <felix.ernst at ulb.ac.be> wrote:
Hi all,
I got different results constructing a SummarizedExperiment in 3.6 and
3.7. My question is, whether this is intentional or a bug.
library(GenomicRanges)
library(SummarizedExperiment)
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- LETTERS[1:6]
rownames(counts) <- 1:nrows
counts2 <- counts-floor(counts)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se <- SummarizedExperiment(assays=list(counts=counts),
rowRanges=rowRanges,
colData=colData)
str(assays(se)$counts)
assays(se)$counts2 <- as.data.frame(counts2)
str(assays(se)$counts)
On a Windows 10 R3.4.2 Bioc 3.6 this produces:
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
On Ubuntu 17.10 R-devel r73779 Bioc3.7 this produces
num [1:200, 1:6] 8636 7040 9275 4821 2475 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:1200] 8636 7040 9275 4821 2475 ...
Somehow the structure is lost.
This happens, if I mix matrix and data.frame data, and doesn?t, if I use
only matrices. The man page defines matrix-like objects,
which a data.frame is (isn?t it?) and the behavior is different from
Bioc3.6 to Bioc3.7.
I can rule out that this is a Windows/Linux thing, because the Travis
build error, which pointed to an difference in the first place,
didn?t occur with bioc-release, just with bioc-devel.
Thanks for any advice and suggestions.
Felix
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
2 days later
Hi, Looks like at an even lower level, S4Vectors:::listElementType() is at the origin of the problem: > S4Vectors:::listElementType(list(matrix(), data.frame())) [1] "vector" Should return "ANY" here. Will try to fix. H.
On 11/26/2017 07:03 AM, Martin Morgan wrote:
It would seem to be a bug in endoapply
lst <- SimpleList(
m = matrix(0, 2, 2, dimnames=list(letters[1:2], LETTERS[1:2])),
df = data.frame(A=1:2, B=1:2, row.names=letters[1:2])
)
dimnames(lst[[1]]) # list(c("a", "b"), c("A", "B"))
dimnames(endoapply(lst, identity)[[1]]) # NULL
specifically S4Vectors:::coerceToSimpleList
lst <- list(
m = matrix(0, 2, 2, dimnames=list(letters[1:2], LETTERS[1:2])),
df = data.frame(A=1:2, B=1:2, row.names=letters[1:2])
)
S4Vectors:::coerceToSimpleList(lst)
Martin
On 11/26/2017 07:56 AM, Vincent Carey wrote:
Confirmed with the following sessionInfo(), satisfying biocValid()==TRUE
sessionInfo()
R Under development (unstable) (2017-11-22 r73776) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 18.1 Matrix products: default BLAS: /home/stvjc/R-35-dist/lib/R/lib/libRblas.so LAPACK: /home/stvjc/R-35-dist/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] SummarizedExperiment_1.9.2 DelayedArray_0.5.5 [3] matrixStats_0.52.2 Biobase_2.39.0 [5] GenomicRanges_1.31.1 GenomeInfoDb_1.15.1 [7] IRanges_2.13.4 S4Vectors_0.17.10 [9] BiocGenerics_0.25.0 loaded via a namespace (and not attached): [1] lattice_0.20-35 bitops_1.0-6 grid_3.5.0 [4] zlibbioc_1.25.0 XVector_0.19.1 Matrix_1.2-12 [7] tools_3.5.0 RCurl_1.95-4.8 compiler_3.5.0 [10] GenomeInfoDbData_0.99.2 On Sun, Nov 26, 2017 at 7:09 AM, Felix Ernst <felix.ernst at ulb.ac.be> wrote:
Hi all,
I got different results constructing a SummarizedExperiment in 3.6 and
3.7. My question is, whether this is intentional or a bug.
library(GenomicRanges)
library(SummarizedExperiment)
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- LETTERS[1:6]
rownames(counts) <- 1:nrows
counts2 <- counts-floor(counts)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se <- SummarizedExperiment(assays=list(counts=counts),
rowRanges=rowRanges,
colData=colData)
str(assays(se)$counts)
assays(se)$counts2 <- as.data.frame(counts2)
str(assays(se)$counts)
On a Windows 10 R3.4.2 Bioc 3.6 this produces:
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
On Ubuntu 17.10 R-devel r73779 Bioc3.7 this produces
num [1:200, 1:6] 8636 7040 9275 4821 2475 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:1200] 8636 7040 9275 4821 2475 ...
Somehow the structure is lost.
This happens, if I mix matrix and data.frame data, and doesn?t, if I use
only matrices. The man page defines matrix-like objects,
which a data.frame is (isn?t it?) and the behavior is different from
Bioc3.6 to Bioc3.7.
I can rule out that this is a Windows/Linux thing, because the Travis
build error, which pointed to an difference in the first place,
didn?t occur with bioc-release, just with bioc-devel.
Thanks for any advice and suggestions.
Felix
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=SrbnY4HvnR7uE6LrH4stQ9IFdOuM8t4iAAfY0cNl5os&s=fdsgKHDmmwwW2_VMcibMhHtNe79f9cDWa8igAAlidII&e=
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=SrbnY4HvnR7uE6LrH4stQ9IFdOuM8t4iAAfY0cNl5os&s=fdsgKHDmmwwW2_VMcibMhHtNe79f9cDWa8igAAlidII&e=
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Hi Felix, This should be addressed in S4Vectors 0.17.11. Thanks for the catch and for the nice reproducible example. Best, H.
On 11/26/2017 04:09 AM, Felix Ernst wrote:
Hi all,
I got different results constructing a SummarizedExperiment in 3.6 and 3.7. My question is, whether this is intentional or a bug.
library(GenomicRanges)
library(SummarizedExperiment)
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- LETTERS[1:6]
rownames(counts) <- 1:nrows
counts2 <- counts-floor(counts)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se <- SummarizedExperiment(assays=list(counts=counts),
rowRanges=rowRanges,
colData=colData)
str(assays(se)$counts)
assays(se)$counts2 <- as.data.frame(counts2)
str(assays(se)$counts)
On a Windows 10 R3.4.2 Bioc 3.6 this produces:
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
On Ubuntu 17.10 R-devel r73779 Bioc3.7 this produces
num [1:200, 1:6] 8636 7040 9275 4821 2475 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:200] "1" "2" "3" "4" ...
..$ : chr [1:6] "A" "B" "C" "D" ...
num [1:1200] 8636 7040 9275 4821 2475 ...
Somehow the structure is lost.
This happens, if I mix matrix and data.frame data, and doesn?t, if I use only matrices. The man page defines matrix-like objects,
which a data.frame is (isn?t it?) and the behavior is different from Bioc3.6 to Bioc3.7.
I can rule out that this is a Windows/Linux thing, because the Travis build error, which pointed to an difference in the first place,
didn?t occur with bioc-release, just with bioc-devel.
Thanks for any advice and suggestions.
Felix
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r0ReaWhkBHVdtwgDqm28L7sqt0X9ojrCFKYn2Q0gV-I&s=qmErsDhhNpyiprtwyB72UUXQr33EJH8nd4JupFmmSBA&e=
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319