Skip to content

isSeekable returns F on seekable file

4 messages · Laurens Leerink, Henrik Bengtsson, laurent buffat

#
Hi there,

First, please apologize, I?m not fluent in English.

I try to manipulate very large object with R, and I have some problems with
memory and time access, because of the ? by value mechanism ?.
I would like to ? encapsulate ? a large vector in a class and access to the
vector by method and replaceMethod, but where is a lot of ? implicit copy ?,
and so, a lot of memory and time consuming.

The data are very large, and come from micro array experiment (see
http://Biocondutor.org for more detail of what is a micro array ) , but a
 typical ? vector is a 20000 genes * 20 probes * 100 experiments * 2 (means
and variance)

The best way, in term of speed and memory is to try to emulate a ? by
reference ? mechanism, but it?s not very ? in the spirit of R ? and a little
? dangerous ? (see the example).

Could you give me some recommendations ?

Thanks for your help.

The code below is a little ? long ?, sorry.

Laurent B.

////////////////////////////

setClass("Foo", representation(v = "numeric"))

setMethod("initialize", signature("Foo"), function(.Object, v=vector()) {
		.Object at v <- v
		.Object
	   })


setGeneric("v", function(.Object) standardGeneric("v"))
setMethod("v", "Foo", function(.Object) .Object at v )

setGeneric("v<-",function(.Object,value) standardGeneric("v<-"))
setReplaceMethod("v", "Foo", function(.Object, value) {
	.Object at v <- value
         return(.Object)
         })

setMethod("[","Foo", function(x,i,j=NA,...,drop=FALSE) x at v[i] )

setReplaceMethod("[","Foo",function(x,i,j=NA,...,value) {
	x at v[i] <- value
	x
	})

n <- 2000 * 20 * 100 * 2

# in fact I would like to have
# 20000 genes * 20 mesures by genes (probes) * 100 experiences * 2 ( mean
and variance)
# but, it's to much memory for these example, so just try with 2000 "genes".

x <- rep(1,n)
# x, a non encapsuled vetor for the data "
y <- new("Foo",v=x)
# y, a encapsuled version".


x[1] <- 2
y at v[1] <- 2
v(y)[1] <- 2
y[1] <- 2

nt <- 10 # number of test

system.time(for(i in 1:nt) x[1] <- 2)
system.time(for(i in 1:nt) y at v[1] <- 2)
system.time(for(i in 1:nt) v(y)[1] <- 2)
system.time(for(i in 1:nt) y[1] <- 2)

[1] 0 0 0 0 0
[1]  7.80  3.17 10.97  0.00  0.00
[1] 10.19  5.39 15.60  0.00  0.00
[1]  9.00  4.54 13.55  0.00  0.00

x[1:2]
y[1:2]
v(y)[1:2]
y at v[1:2]

system.time(for(i in 1:nt) x[1:2])
system.time(for(i in 1:nt) y[1:2])
system.time(for(i in 1:nt) v(y)[1:2])
system.time(for(i in 1:nt) y at v[1:2])


[1] 0 0 0 0 0
[1] 0 0 0 0 0
[1] 0 0 0 0 0
[1] 0 0 0 0 0

# no problem for "acces method, only for replace method
# Class FooPtr,
# a way to try to by pass the "by value mecanizim of R" ...

setClass("FooPtr", representation(p = "environment"))

setMethod("initialize", signature("FooPtr"), function(.Object, v=vector()) {
		.Object at p <- new("environment")
		assign("v",v,envir=.Object at p)
		.Object
	   })

setMethod("v", "FooPtr", function(.Object) get("v",envir=.Object at p) )

setReplaceMethod("v", "FooPtr",
                   function(.Object, value) {
                   assign("v",value,envir=.Object at p)
                   return(.Object)
                 })

setMethod("[","FooPtr", function(x,i,j=NA,...,drop=FALSE)
get("v",envir=x at p)[i] )

# a first version of "[<-" for FooPtr :

setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value)
	{
	v<- get("v",envir=x at p)
	v[i] <- value
	assign("v",v,envir=x at p)
	x
	})

z <- new("FooPtr",v=x)

x[1] <- 2
v(z)[1] <- 2
z[1] <- 2


system.time(for(i in 1:nt) x[1] <- 2)
system.time(for(i in 1:nt) v(z)[1] <- 2)
system.time(for(i in 1:nt) z[1] <- 2)

[1] 0.01 0.00 0.01 0.00 0.00
[1] 0 0 0 0 0
[1] 1.63 1.18 2.81 0.00 0.00

# the v(z)[1] is "good", but not "[<-"
# a more creasy way to try "by reference"

setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value)
	{
	assign("i",i,envir=x at p)
	assign("value",value,envir=x at p)
	eval(expression(v[i] <- value), envir=x at p)
	rm("i","value",envir=x at p)
	x
	})

system.time(for(i in 1:nt) x[1] <- 2)
system.time(for(i in 1:nt) v(z)[1] <- 2)
system.time(for(i in 1:nt) z[1] <- 2)

[1] 0 0 0 0 0
[1] 0 0 0 0 0
[1] 0.14 0.12 0.26 0.00 0.00

# "[<-" is better, but v(z)[] is the best ... (why ???)


# ok, v(z)[i] is the "best" acess, but you need to know what you do :

v(z)[1] <- 12345
z1 <- z
v(z1)[1]

# z and z1 work with the same environment ...

//////////////////////

Thanks for your help.

Laurent
#
Hi Laurent, this is exactly the problem I had to when I was started to
work on microarray data. Your strategy works and it does indeed improve
the memory and time efficiency quite a bit. It is just a matter on what
granuality you want to emulate references, i.e. a matrix, a column of a
matrix or a single cell. I have stayed with a matrix and when I update
the matrix R (50000x20) in a quadruple of (R,G,Rb,Gb) it does help since
I do not have to pay the cost of having G, Rb and Gb coupled to the same
data structure.

FYI: Since 2001, I have developed the R.oo package
(http://www.maths.lth.se/help/R/R.classes/) based a similar idea to what
you are suggesting, i.e. use environments or similar functionalities to
emulate pointers and provide it in a reusable way. It implements some
extra features too, however not necessary in this context. Note also
that R.oo is more in the spirit of "a method belongs to a class" and not
"a method belongs to a generic function", which is the idea of R, but it
is not a restriction. At this moment R.oo is based on S4, but I intend
to upgrade to S4. My microarray package com.braju.sma is then making use
of R.oo wherever microarray structures are defined.

Best wishes

Henrik Bengtsson
Lund University
2 days later
#
Hi Henrik,

thanks a lot for your references (R.oo and com.braju.sma). It's a great
help.

Best regards,

laurent buffat

-----Message d'origine-----
De : Henrik Bengtsson [mailto:hb at maths.lth.se]
Envoye : vendredi 23 mai 2003 18:07
A : 'laurent buffat'; r-help at stat.math.ethz.ch
Objet : RE: [R] replaceMethod time and memory for very large object.


Hi Laurent, this is exactly the problem I had to when I was started to
work on microarray data. Your strategy works and it does indeed improve
the memory and time efficiency quite a bit. It is just a matter on what
granuality you want to emulate references, i.e. a matrix, a column of a
matrix or a single cell. I have stayed with a matrix and when I update
the matrix R (50000x20) in a quadruple of (R,G,Rb,Gb) it does help since
I do not have to pay the cost of having G, Rb and Gb coupled to the same
data structure.

FYI: Since 2001, I have developed the R.oo package
(http://www.maths.lth.se/help/R/R.classes/) based a similar idea to what
you are suggesting, i.e. use environments or similar functionalities to
emulate pointers and provide it in a reusable way. It implements some
extra features too, however not necessary in this context. Note also
that R.oo is more in the spirit of "a method belongs to a class" and not
"a method belongs to a generic function", which is the idea of R, but it
is not a restriction. At this moment R.oo is based on S4, but I intend
to upgrade to S4. My microarray package com.braju.sma is then making use
of R.oo wherever microarray structures are defined.

Best wishes

Henrik Bengtsson
Lund University