Dear all, With great interest I followed the discussion: https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html since I have currently a similar problem: In a new R session (using xterm) I am importing a simple table "Hu6800_ann.txt" which has a size of 754KB only: > ann <- read.delim("Hu6800_ann.txt") > dim(ann) [1] 7129 11 When I call "object.size(ann)" the estimated memory used to store "ann" is already 2MB: > object.size(ann) 2034784 bytes Now I call "split()" and check the estimated memory used which turns out to be 3.3GB: > u2p <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) > object.size(u2p) 3323768120 bytes During the R session I am running "top" in another xterm and can see that the memory usage of R increases to about 550MB RSIZE. Now I do: > object.size(unlist(u2p)) 894056 bytes It takes about 3 minutes to complete this call and the memory usage of R increases to about 1.3GB RSIZE. Furthermore, during evaluation of this function the free RAM of my Mac decreases to less than 8MB free PhysMem, until it needs to swap memory. When finished, free PhysMem is 734MB but the size of R increased to 577MB RSIZE. Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not change the object.size, only processing was faster and it did use less memory on my Mac. Do you have any idea what the reason for this behavior is? Why is the size of list "u2p" so large? Do I make any mistake? Here is my sessionInfo on a MacBook Pro with 2GB RAM: > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._
Severe memory problem using split()
5 messages · Martin Morgan, cstrato
On 07/12/2010 01:45 PM, cstrato wrote:
Dear all, With great interest I followed the discussion: https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html since I have currently a similar problem: In a new R session (using xterm) I am importing a simple table "Hu6800_ann.txt" which has a size of 754KB only:
ann <- read.delim("Hu6800_ann.txt")
dim(ann)
[1] 7129 11 When I call "object.size(ann)" the estimated memory used to store "ann" is already 2MB:
object.size(ann)
2034784 bytes Now I call "split()" and check the estimated memory used which turns out to be 3.3GB:
u2p <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) object.size(u2p)
3323768120 bytes
I guess things improve with stringsAsFactors=FALSE in read.delim? Martin
During the R session I am running "top" in another xterm and can see that the memory usage of R increases to about 550MB RSIZE. Now I do:
object.size(unlist(u2p))
894056 bytes It takes about 3 minutes to complete this call and the memory usage of R increases to about 1.3GB RSIZE. Furthermore, during evaluation of this function the free RAM of my Mac decreases to less than 8MB free PhysMem, until it needs to swap memory. When finished, free PhysMem is 734MB but the size of R increased to 577MB RSIZE. Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not change the object.size, only processing was faster and it did use less memory on my Mac. Do you have any idea what the reason for this behavior is? Why is the size of list "u2p" so large? Do I make any mistake? Here is my sessionInfo on a MacBook Pro with 2GB RAM:
sessionInfo()
R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
Dear Martin,
Thank you, you are right, now I get:
> ann <- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE)
> object.size(ann)
2035952 bytes
> u2p <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
> object.size(u2p)
1207368 bytes
> object.size(unlist(u2p))
865176 bytes
Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of
a table of size 754KB seems still to be pretty large?
Best regards
Christian
On 7/12/10 11:44 PM, Martin Morgan wrote:
On 07/12/2010 01:45 PM, cstrato wrote:
Dear all, With great interest I followed the discussion: https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html since I have currently a similar problem: In a new R session (using xterm) I am importing a simple table "Hu6800_ann.txt" which has a size of 754KB only:
ann<- read.delim("Hu6800_ann.txt")
dim(ann)
[1] 7129 11 When I call "object.size(ann)" the estimated memory used to store "ann" is already 2MB:
object.size(ann)
2034784 bytes Now I call "split()" and check the estimated memory used which turns out to be 3.3GB:
u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) object.size(u2p)
3323768120 bytes
I guess things improve with stringsAsFactors=FALSE in read.delim? Martin
During the R session I am running "top" in another xterm and can see that the memory usage of R increases to about 550MB RSIZE. Now I do:
object.size(unlist(u2p))
894056 bytes It takes about 3 minutes to complete this call and the memory usage of R increases to about 1.3GB RSIZE. Furthermore, during evaluation of this function the free RAM of my Mac decreases to less than 8MB free PhysMem, until it needs to swap memory. When finished, free PhysMem is 734MB but the size of R increased to 577MB RSIZE. Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not change the object.size, only processing was faster and it did use less memory on my Mac. Do you have any idea what the reason for this behavior is? Why is the size of list "u2p" so large? Do I make any mistake? Here is my sessionInfo on a MacBook Pro with 2GB RAM:
sessionInfo()
R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 07/12/2010 03:00 PM, cstrato wrote:
Dear Martin, Thank you, you are right, now I get:
ann <- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE)
object.size(ann)
2035952 bytes
u2p <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) object.size(u2p)
1207368 bytes
object.size(unlist(u2p))
865176 bytes Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of
but it's a list of length(unique(ann[["UNIT_ID"]]))) elements, each of which has a pointer to the element, a pointer to the name of the element, and the element data itself. I'd guess it adds up in a non-mysterious way. For a sense of it (and maybe only understandable if you have a working understanding of how R represents data) see, e.g.,
.Internal(inspect(list(x=1,y=2)))
@1a4c538 19 VECSXP g0c2 [ATT] (len=2, tl=0)
@191cad8 14 REALSXP g0c1 [] (len=1, tl=0) 1
@191caa8 14 REALSXP g0c1 [] (len=1, tl=0) 2
ATTRIB:
@16fc8d8 02 LISTSXP g0c0 []
TAG: @60cf18 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "names"
@1a4c500 16 STRSXP g0c2 [] (len=2, tl=0)
@674e88 09 CHARSXP g0c1 [MARK,gp=0x21] "x"
@728c38 09 CHARSXP g0c1 [MARK,gp=0x21] "y"
Martin
a table of size 754KB seems still to be pretty large? Best regards Christian On 7/12/10 11:44 PM, Martin Morgan wrote:
On 07/12/2010 01:45 PM, cstrato wrote:
Dear all, With great interest I followed the discussion: https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html since I have currently a similar problem: In a new R session (using xterm) I am importing a simple table "Hu6800_ann.txt" which has a size of 754KB only:
ann<- read.delim("Hu6800_ann.txt")
dim(ann)
[1] 7129 11 When I call "object.size(ann)" the estimated memory used to store "ann" is already 2MB:
object.size(ann)
2034784 bytes Now I call "split()" and check the estimated memory used which turns out to be 3.3GB:
u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) object.size(u2p)
3323768120 bytes
I guess things improve with stringsAsFactors=FALSE in read.delim? Martin
During the R session I am running "top" in another xterm and can see that the memory usage of R increases to about 550MB RSIZE. Now I do:
object.size(unlist(u2p))
894056 bytes It takes about 3 minutes to complete this call and the memory usage of R increases to about 1.3GB RSIZE. Furthermore, during evaluation of this function the free RAM of my Mac decreases to less than 8MB free PhysMem, until it needs to swap memory. When finished, free PhysMem is 734MB but the size of R increased to 577MB RSIZE. Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not change the object.size, only processing was faster and it did use less memory on my Mac. Do you have any idea what the reason for this behavior is? Why is the size of list "u2p" so large? Do I make any mistake? Here is my sessionInfo on a MacBook Pro with 2GB RAM:
sessionInfo()
R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
Dear Martin, Thank you for this explanation. Best regards Christian
On 7/13/10 12:31 AM, Martin Morgan wrote:
On 07/12/2010 03:00 PM, cstrato wrote:
Dear Martin, Thank you, you are right, now I get:
ann<- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE)
object.size(ann)
2035952 bytes
u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) object.size(u2p)
1207368 bytes
object.size(unlist(u2p))
865176 bytes Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of
but it's a list of length(unique(ann[["UNIT_ID"]]))) elements, each of which has a pointer to the element, a pointer to the name of the element, and the element data itself. I'd guess it adds up in a non-mysterious way. For a sense of it (and maybe only understandable if you have a working understanding of how R represents data) see, e.g.,
.Internal(inspect(list(x=1,y=2)))
@1a4c538 19 VECSXP g0c2 [ATT] (len=2, tl=0)
@191cad8 14 REALSXP g0c1 [] (len=1, tl=0) 1
@191caa8 14 REALSXP g0c1 [] (len=1, tl=0) 2
ATTRIB:
@16fc8d8 02 LISTSXP g0c0 []
TAG: @60cf18 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "names"
@1a4c500 16 STRSXP g0c2 [] (len=2, tl=0)
@674e88 09 CHARSXP g0c1 [MARK,gp=0x21] "x"
@728c38 09 CHARSXP g0c1 [MARK,gp=0x21] "y"
Martin
a table of size 754KB seems still to be pretty large? Best regards Christian On 7/12/10 11:44 PM, Martin Morgan wrote:
On 07/12/2010 01:45 PM, cstrato wrote:
Dear all, With great interest I followed the discussion: https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html since I have currently a similar problem: In a new R session (using xterm) I am importing a simple table "Hu6800_ann.txt" which has a size of 754KB only:
ann<- read.delim("Hu6800_ann.txt")
dim(ann)
[1] 7129 11 When I call "object.size(ann)" the estimated memory used to store "ann" is already 2MB:
object.size(ann)
2034784 bytes Now I call "split()" and check the estimated memory used which turns out to be 3.3GB:
u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) object.size(u2p)
3323768120 bytes
I guess things improve with stringsAsFactors=FALSE in read.delim? Martin
During the R session I am running "top" in another xterm and can see that the memory usage of R increases to about 550MB RSIZE. Now I do:
object.size(unlist(u2p))
894056 bytes It takes about 3 minutes to complete this call and the memory usage of R increases to about 1.3GB RSIZE. Furthermore, during evaluation of this function the free RAM of my Mac decreases to less than 8MB free PhysMem, until it needs to swap memory. When finished, free PhysMem is 734MB but the size of R increased to 577MB RSIZE. Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not change the object.size, only processing was faster and it did use less memory on my Mac. Do you have any idea what the reason for this behavior is? Why is the size of list "u2p" so large? Do I make any mistake? Here is my sessionInfo on a MacBook Pro with 2GB RAM:
sessionInfo()
R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel