An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20030523/ed10184d/attachment.pl
isSeekable returns F on seekable file
4 messages · Laurens Leerink, Henrik Bengtsson, laurent buffat
Hi there, First, please apologize, I?m not fluent in English. I try to manipulate very large object with R, and I have some problems with memory and time access, because of the ? by value mechanism ?. I would like to ? encapsulate ? a large vector in a class and access to the vector by method and replaceMethod, but where is a lot of ? implicit copy ?, and so, a lot of memory and time consuming. The data are very large, and come from micro array experiment (see http://Biocondutor.org for more detail of what is a micro array ) , but a typical ? vector is a 20000 genes * 20 probes * 100 experiments * 2 (means and variance) The best way, in term of speed and memory is to try to emulate a ? by reference ? mechanism, but it?s not very ? in the spirit of R ? and a little ? dangerous ? (see the example). Could you give me some recommendations ? Thanks for your help. The code below is a little ? long ?, sorry. Laurent B. //////////////////////////// setClass("Foo", representation(v = "numeric")) setMethod("initialize", signature("Foo"), function(.Object, v=vector()) { .Object at v <- v .Object }) setGeneric("v", function(.Object) standardGeneric("v")) setMethod("v", "Foo", function(.Object) .Object at v ) setGeneric("v<-",function(.Object,value) standardGeneric("v<-")) setReplaceMethod("v", "Foo", function(.Object, value) { .Object at v <- value return(.Object) }) setMethod("[","Foo", function(x,i,j=NA,...,drop=FALSE) x at v[i] ) setReplaceMethod("[","Foo",function(x,i,j=NA,...,value) { x at v[i] <- value x }) n <- 2000 * 20 * 100 * 2 # in fact I would like to have # 20000 genes * 20 mesures by genes (probes) * 100 experiences * 2 ( mean and variance) # but, it's to much memory for these example, so just try with 2000 "genes". x <- rep(1,n) # x, a non encapsuled vetor for the data " y <- new("Foo",v=x) # y, a encapsuled version". x[1] <- 2 y at v[1] <- 2 v(y)[1] <- 2 y[1] <- 2 nt <- 10 # number of test system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) y at v[1] <- 2) system.time(for(i in 1:nt) v(y)[1] <- 2) system.time(for(i in 1:nt) y[1] <- 2) [1] 0 0 0 0 0 [1] 7.80 3.17 10.97 0.00 0.00 [1] 10.19 5.39 15.60 0.00 0.00 [1] 9.00 4.54 13.55 0.00 0.00 x[1:2] y[1:2] v(y)[1:2] y at v[1:2] system.time(for(i in 1:nt) x[1:2]) system.time(for(i in 1:nt) y[1:2]) system.time(for(i in 1:nt) v(y)[1:2]) system.time(for(i in 1:nt) y at v[1:2]) [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0 0 0 0 0 # no problem for "acces method, only for replace method # Class FooPtr, # a way to try to by pass the "by value mecanizim of R" ... setClass("FooPtr", representation(p = "environment")) setMethod("initialize", signature("FooPtr"), function(.Object, v=vector()) { .Object at p <- new("environment") assign("v",v,envir=.Object at p) .Object }) setMethod("v", "FooPtr", function(.Object) get("v",envir=.Object at p) ) setReplaceMethod("v", "FooPtr", function(.Object, value) { assign("v",value,envir=.Object at p) return(.Object) }) setMethod("[","FooPtr", function(x,i,j=NA,...,drop=FALSE) get("v",envir=x at p)[i] ) # a first version of "[<-" for FooPtr : setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value) { v<- get("v",envir=x at p) v[i] <- value assign("v",v,envir=x at p) x }) z <- new("FooPtr",v=x) x[1] <- 2 v(z)[1] <- 2 z[1] <- 2 system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) v(z)[1] <- 2) system.time(for(i in 1:nt) z[1] <- 2) [1] 0.01 0.00 0.01 0.00 0.00 [1] 0 0 0 0 0 [1] 1.63 1.18 2.81 0.00 0.00 # the v(z)[1] is "good", but not "[<-" # a more creasy way to try "by reference" setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value) { assign("i",i,envir=x at p) assign("value",value,envir=x at p) eval(expression(v[i] <- value), envir=x at p) rm("i","value",envir=x at p) x }) system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) v(z)[1] <- 2) system.time(for(i in 1:nt) z[1] <- 2) [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0.14 0.12 0.26 0.00 0.00 # "[<-" is better, but v(z)[] is the best ... (why ???) # ok, v(z)[i] is the "best" acess, but you need to know what you do : v(z)[1] <- 12345 z1 <- z v(z1)[1] # z and z1 work with the same environment ... ////////////////////// Thanks for your help. Laurent
Hi Laurent, this is exactly the problem I had to when I was started to work on microarray data. Your strategy works and it does indeed improve the memory and time efficiency quite a bit. It is just a matter on what granuality you want to emulate references, i.e. a matrix, a column of a matrix or a single cell. I have stayed with a matrix and when I update the matrix R (50000x20) in a quadruple of (R,G,Rb,Gb) it does help since I do not have to pay the cost of having G, Rb and Gb coupled to the same data structure. FYI: Since 2001, I have developed the R.oo package (http://www.maths.lth.se/help/R/R.classes/) based a similar idea to what you are suggesting, i.e. use environments or similar functionalities to emulate pointers and provide it in a reusable way. It implements some extra features too, however not necessary in this context. Note also that R.oo is more in the spirit of "a method belongs to a class" and not "a method belongs to a generic function", which is the idea of R, but it is not a restriction. At this moment R.oo is based on S4, but I intend to upgrade to S4. My microarray package com.braju.sma is then making use of R.oo wherever microarray structures are defined. Best wishes Henrik Bengtsson Lund University
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of laurent buffat Sent: den 23 maj 2003 16:01 To: r-help at stat.math.ethz.ch Subject: [R] replaceMethod time and memory for very large object. Hi there, First, please apologize, I'm not fluent in English. I try to manipulate very large object with R, and I have some problems with memory and time access, because of the < by value mechanism >. I would like to < encapsulate > a large vector in a class and access to the vector by method and replaceMethod, but where is a lot of < implicit copy >, and so, a lot of memory and time consuming. The data are very large, and come from micro array experiment (see http://Biocondutor.org for more detail of what is a micro array ) , but a typical > vector is a 20000 genes * 20 probes * 100 experiments * 2 (means and variance) The best way, in term of speed and memory is to try to emulate a < by reference > mechanism, but it's not very < in the spirit of R > and a little < dangerous > (see the example). Could you give me some recommendations ? Thanks for your help. The code below is a little < long >, sorry. Laurent B. //////////////////////////// setClass("Foo", representation(v = "numeric")) setMethod("initialize", signature("Foo"), function(.Object, v=vector()) { .Object at v <- v .Object }) setGeneric("v", function(.Object) standardGeneric("v")) setMethod("v", "Foo", function(.Object) .Object at v ) setGeneric("v<-",function(.Object,value) standardGeneric("v<-")) setReplaceMethod("v", "Foo", function(.Object, value) { .Object at v <- value return(.Object) }) setMethod("[","Foo", function(x,i,j=NA,...,drop=FALSE) x at v[i] ) setReplaceMethod("[","Foo",function(x,i,j=NA,...,value) { x at v[i] <- value x }) n <- 2000 * 20 * 100 * 2 # in fact I would like to have # 20000 genes * 20 mesures by genes (probes) * 100 experiences * 2 ( mean and variance) # but, it's to much memory for these example, so just try with 2000 "genes". x <- rep(1,n) # x, a non encapsuled vetor for the data " y <- new("Foo",v=x) # y, a encapsuled version". x[1] <- 2 y at v[1] <- 2 v(y)[1] <- 2 y[1] <- 2 nt <- 10 # number of test system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) y at v[1] <- 2) system.time(for(i in 1:nt) v(y)[1] <- 2) system.time(for(i in 1:nt) y[1] <- 2) [1] 0 0 0 0 0 [1] 7.80 3.17 10.97 0.00 0.00 [1] 10.19 5.39 15.60 0.00 0.00 [1] 9.00 4.54 13.55 0.00 0.00 x[1:2] y[1:2] v(y)[1:2] y at v[1:2] system.time(for(i in 1:nt) x[1:2]) system.time(for(i in 1:nt) y[1:2]) system.time(for(i in 1:nt) v(y)[1:2]) system.time(for(i in 1:nt) y at v[1:2]) [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0 0 0 0 0 # no problem for "acces method, only for replace method # Class FooPtr, # a way to try to by pass the "by value mecanizim of R" ... setClass("FooPtr", representation(p = "environment")) setMethod("initialize", signature("FooPtr"), function(.Object, v=vector()) { .Object at p <- new("environment") assign("v",v,envir=.Object at p) .Object }) setMethod("v", "FooPtr", function(.Object) get("v",envir=.Object at p) ) setReplaceMethod("v", "FooPtr", function(.Object, value) { assign("v",value,envir=.Object at p) return(.Object) }) setMethod("[","FooPtr", function(x,i,j=NA,...,drop=FALSE) get("v",envir=x at p)[i] ) # a first version of "[<-" for FooPtr : setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value) { v<- get("v",envir=x at p) v[i] <- value assign("v",v,envir=x at p) x }) z <- new("FooPtr",v=x) x[1] <- 2 v(z)[1] <- 2 z[1] <- 2 system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) v(z)[1] <- 2) system.time(for(i in 1:nt) z[1] <- 2) [1] 0.01 0.00 0.01 0.00 0.00 [1] 0 0 0 0 0 [1] 1.63 1.18 2.81 0.00 0.00 # the v(z)[1] is "good", but not "[<-" # a more creasy way to try "by reference" setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value) { assign("i",i,envir=x at p) assign("value",value,envir=x at p) eval(expression(v[i] <- value), envir=x at p) rm("i","value",envir=x at p) x }) system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) v(z)[1] <- 2) system.time(for(i in 1:nt) z[1] <- 2) [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0.14 0.12 0.26 0.00 0.00 # "[<-" is better, but v(z)[] is the best ... (why ???) # ok, v(z)[i] is the "best" acess, but you need to know what you do : v(z)[1] <- 12345 z1 <- z v(z1)[1] # z and z1 work with the same environment ... ////////////////////// Thanks for your help. Laurent
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
2 days later
Hi Henrik, thanks a lot for your references (R.oo and com.braju.sma). It's a great help. Best regards, laurent buffat -----Message d'origine----- De : Henrik Bengtsson [mailto:hb at maths.lth.se] Envoye : vendredi 23 mai 2003 18:07 A : 'laurent buffat'; r-help at stat.math.ethz.ch Objet : RE: [R] replaceMethod time and memory for very large object. Hi Laurent, this is exactly the problem I had to when I was started to work on microarray data. Your strategy works and it does indeed improve the memory and time efficiency quite a bit. It is just a matter on what granuality you want to emulate references, i.e. a matrix, a column of a matrix or a single cell. I have stayed with a matrix and when I update the matrix R (50000x20) in a quadruple of (R,G,Rb,Gb) it does help since I do not have to pay the cost of having G, Rb and Gb coupled to the same data structure. FYI: Since 2001, I have developed the R.oo package (http://www.maths.lth.se/help/R/R.classes/) based a similar idea to what you are suggesting, i.e. use environments or similar functionalities to emulate pointers and provide it in a reusable way. It implements some extra features too, however not necessary in this context. Note also that R.oo is more in the spirit of "a method belongs to a class" and not "a method belongs to a generic function", which is the idea of R, but it is not a restriction. At this moment R.oo is based on S4, but I intend to upgrade to S4. My microarray package com.braju.sma is then making use of R.oo wherever microarray structures are defined. Best wishes Henrik Bengtsson Lund University
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of laurent buffat Sent: den 23 maj 2003 16:01 To: r-help at stat.math.ethz.ch Subject: [R] replaceMethod time and memory for very large object. Hi there, First, please apologize, I'm not fluent in English. I try to manipulate very large object with R, and I have some problems with memory and time access, because of the < by value mechanism >. I would like to < encapsulate > a large vector in a class and access to the vector by method and replaceMethod, but where is a lot of < implicit copy >, and so, a lot of memory and time consuming. The data are very large, and come from micro array experiment (see http://Biocondutor.org for more detail of what is a micro array ) , but a typical > vector is a 20000 genes * 20 probes * 100 experiments * 2 (means and variance) The best way, in term of speed and memory is to try to emulate a < by reference > mechanism, but it's not very < in the spirit of R > and a little < dangerous > (see the example). Could you give me some recommendations ? Thanks for your help. The code below is a little < long >, sorry. Laurent B. //////////////////////////// setClass("Foo", representation(v = "numeric")) setMethod("initialize", signature("Foo"), function(.Object, v=vector()) { .Object at v <- v .Object }) setGeneric("v", function(.Object) standardGeneric("v")) setMethod("v", "Foo", function(.Object) .Object at v ) setGeneric("v<-",function(.Object,value) standardGeneric("v<-")) setReplaceMethod("v", "Foo", function(.Object, value) { .Object at v <- value return(.Object) }) setMethod("[","Foo", function(x,i,j=NA,...,drop=FALSE) x at v[i] ) setReplaceMethod("[","Foo",function(x,i,j=NA,...,value) { x at v[i] <- value x }) n <- 2000 * 20 * 100 * 2 # in fact I would like to have # 20000 genes * 20 mesures by genes (probes) * 100 experiences * 2 ( mean and variance) # but, it's to much memory for these example, so just try with 2000 "genes". x <- rep(1,n) # x, a non encapsuled vetor for the data " y <- new("Foo",v=x) # y, a encapsuled version". x[1] <- 2 y at v[1] <- 2 v(y)[1] <- 2 y[1] <- 2 nt <- 10 # number of test system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) y at v[1] <- 2) system.time(for(i in 1:nt) v(y)[1] <- 2) system.time(for(i in 1:nt) y[1] <- 2) [1] 0 0 0 0 0 [1] 7.80 3.17 10.97 0.00 0.00 [1] 10.19 5.39 15.60 0.00 0.00 [1] 9.00 4.54 13.55 0.00 0.00 x[1:2] y[1:2] v(y)[1:2] y at v[1:2] system.time(for(i in 1:nt) x[1:2]) system.time(for(i in 1:nt) y[1:2]) system.time(for(i in 1:nt) v(y)[1:2]) system.time(for(i in 1:nt) y at v[1:2]) [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0 0 0 0 0 # no problem for "acces method, only for replace method # Class FooPtr, # a way to try to by pass the "by value mecanizim of R" ... setClass("FooPtr", representation(p = "environment")) setMethod("initialize", signature("FooPtr"), function(.Object, v=vector()) { .Object at p <- new("environment") assign("v",v,envir=.Object at p) .Object }) setMethod("v", "FooPtr", function(.Object) get("v",envir=.Object at p) ) setReplaceMethod("v", "FooPtr", function(.Object, value) { assign("v",value,envir=.Object at p) return(.Object) }) setMethod("[","FooPtr", function(x,i,j=NA,...,drop=FALSE) get("v",envir=x at p)[i] ) # a first version of "[<-" for FooPtr : setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value) { v<- get("v",envir=x at p) v[i] <- value assign("v",v,envir=x at p) x }) z <- new("FooPtr",v=x) x[1] <- 2 v(z)[1] <- 2 z[1] <- 2 system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) v(z)[1] <- 2) system.time(for(i in 1:nt) z[1] <- 2) [1] 0.01 0.00 0.01 0.00 0.00 [1] 0 0 0 0 0 [1] 1.63 1.18 2.81 0.00 0.00 # the v(z)[1] is "good", but not "[<-" # a more creasy way to try "by reference" setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value) { assign("i",i,envir=x at p) assign("value",value,envir=x at p) eval(expression(v[i] <- value), envir=x at p) rm("i","value",envir=x at p) x }) system.time(for(i in 1:nt) x[1] <- 2) system.time(for(i in 1:nt) v(z)[1] <- 2) system.time(for(i in 1:nt) z[1] <- 2) [1] 0 0 0 0 0 [1] 0 0 0 0 0 [1] 0.14 0.12 0.26 0.00 0.00 # "[<-" is better, but v(z)[] is the best ... (why ???) # ok, v(z)[i] is the "best" acess, but you need to know what you do : v(z)[1] <- 12345 z1 <- z v(z1)[1] # z and z1 work with the same environment ... ////////////////////// Thanks for your help. Laurent
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo> /r-help