An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130412/438dd987/attachment.pl>
[Bioc-devel] pointer and big matrix in R
3 messages · Servant Nicolas, Vincent Carey, Martin Morgan
for some approaches to working with large matrices in R, please look at http://cran.r-project.org/web/views/HighPerformanceComputing.html scroll down to "Large memory and out-of-memory data" i suspect this view text should be enlarged to include rhdf5 and ncdf as additional relevant resources On Fri, Apr 12, 2013 at 1:02 PM, Servant Nicolas
<Nicolas.Servant at curie.fr> wrote:
Dear all,
I have a S4 object (HTCexp from HITC package), composed of one big matrix, and two genomicRanges objects, A and B which describe the matrix raws and columns.
I thinking about a way to decrease the memory size of this object.
I also have methods to get/set the matrix and the two GRanges, namely intdata(), x_intervals(), y_intervals().
In case of symetric matrix, the two GRanges can be the same, so I was interested in simply creating in this case, a pointer from B to A. How can I do it in R please ??
Second, I'm wondering if it exists other matrix-like object optimized for big matrix (5000 x 5000). I quicky saw the Matrix object from the CRAN, useful for sparse matrix.
Any suggestion would be appreciated !
Thank you
Regards
Nicolas
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 04/12/2013 10:02 AM, Servant Nicolas wrote:
Dear all, I have a S4 object (HTCexp from HITC package), composed of one big matrix, and two genomicRanges objects, A and B which describe the matrix raws and columns. I thinking about a way to decrease the memory size of this object. I also have methods to get/set the matrix and the two GRanges, namely intdata(), x_intervals(), y_intervals(). In case of symetric matrix, the two GRanges can be the same, so I was interested in simply creating in this case, a pointer from B to A. How can I do it in R please ?? Second, I'm wondering if it exists other matrix-like object optimized for big matrix (5000 x 5000). I quicky saw the Matrix object from the CRAN, useful for sparse matrix. Any suggestion would be appreciated !
This is not a super-big object, so perhaps you're running in to problems with
R's propensity to copy data? An easy solution might be to re-use the
SummarizedExperiment class, which addresses this issue by placing the 'assays'
data in a reference class.
library(GenomicRanges)
.HTCexp = setClass("HTCexp", contains="SummarizedExperiment",
representation=representation(y_intervals="GenomicRanges"))
HTCexp <-
function(intdata = matrix(0, 0, 0), x_intervals=GRanges(),
y_intervals=GRanges(), ...)
{
.HTCexp(SummarizedExperiment(intdata, rowData=x_intervals),
y_intervals=y_intervals, ...)
}
which already gives
> HTCexp()
class: HTCexp
dim: 0 0
exptData(0):
assays(1): ''
rownames: NULL
rowData metadata column names(0):
colnames: NULL
colData names(0):
> m <- matrix(0, 5000, 5000,
+ dimnames=list(seq_len(5000), seq_len(5000)))
> g <- GRanges("A", IRanges(1:5000, width=0))
> HTCexp(m, g, g)
class: HTCexp
dim: 5000 5000
exptData(0):
assays(1): ''
rownames: NULL
rowData metadata column names(0):
colnames(5000): 1 2 ... 4999 5000
colData names(0):
I think you'd need to implement "[" and a 'y_intervals' accessors
setGeneric("y_intervals", function(x, ...) standardGeneric("y_intervals"))
setMethod("y_intervals", "HTCexp", function(x, ...) {
x at y_intervals
})
setMethod("[", "HTCexp", function(x, i, j, ..., drop=TRUE) {
## not sure that this is complete...
if (missing(i) && missing(j))
x
else {
se <- as(x, "SummarizedExperiment")
if (missing(i))
initialize(x, se[,j], y_intervals=y_intervals(x)[j])
else if (missing(j))
initialize(x, se[i,])
else
initialize(x, se[i,j], y_intervals=y_intervals(x)[j])
}
})
Martin
Thank you Regards Nicolas [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793