Hey,
I have an double loop like this:
chunk <- list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher <- NULL
for(i in chunk[k]){
print("i load something")
dummy <- 1
print("i do something")
dummy <- dummy + 1
print("i do put it together")
DummyCatcher = rbind(DummyCatcher, dummy)
}
print("i save a chunk and restart with another chunk of data")
}
The problem now is that with each 'chunk'-cycle the memory used by R
becomes bigger and bigger until it exceeds my RAM but the RAM it needs
for any of the chunk-cycles alone is only a 1/5th of what I have overall.
Does somebody have an idea why this behaviour might occur? Note that all
the objects (like 'DummyCatcher') are reused every cycle so that I would
assume that the RAM used should stay about the same after the first
'chunk' cycle.
Best, Peter
SystemInfo:
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM
Memory filling up while looping
5 messages · Peter Meißner, jim holtman, Duncan Murdoch
have you tried putting calls to 'gc' at the top of the first loop to make sure memory is reclaimed? You can print the call to 'gc' to see how fast it is growing. On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
<peter.meissner at uni-konstanz.de> wrote:
Hey,
I have an double loop like this:
chunk <- list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher <- NULL
for(i in chunk[k]){
print("i load something")
dummy <- 1
print("i do something")
dummy <- dummy + 1
print("i do put it together")
DummyCatcher = rbind(DummyCatcher, dummy)
}
print("i save a chunk and restart with another chunk of data")
}
The problem now is that with each 'chunk'-cycle the memory used by R becomes
bigger and bigger until it exceeds my RAM but the RAM it needs for any of
the chunk-cycles alone is only a 1/5th of what I have overall.
Does somebody have an idea why this behaviour might occur? Note that all the
objects (like 'DummyCatcher') are reused every cycle so that I would assume
that the RAM used should stay about the same after the first 'chunk' cycle.
Best, Peter
SystemInfo:
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Thanks for your answer, yes, I tried 'gc()' it did not change the bahavior. best, Peter Am 21.12.2012 13:37, schrieb jim holtman:
have you tried putting calls to 'gc' at the top of the first loop to make sure memory is reclaimed? You can print the call to 'gc' to see how fast it is growing. On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner <peter.meissner at uni-konstanz.de> wrote:
Hey,
I have an double loop like this:
chunk <- list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher <- NULL
for(i in chunk[k]){
print("i load something")
dummy <- 1
print("i do something")
dummy <- dummy + 1
print("i do put it together")
DummyCatcher = rbind(DummyCatcher, dummy)
}
print("i save a chunk and restart with another chunk of data")
}
The problem now is that with each 'chunk'-cycle the memory used by R becomes
bigger and bigger until it exceeds my RAM but the RAM it needs for any of
the chunk-cycles alone is only a 1/5th of what I have overall.
Does somebody have an idea why this behaviour might occur? Note that all the
objects (like 'DummyCatcher') are reused every cycle so that I would assume
that the RAM used should stay about the same after the first 'chunk' cycle.
Best, Peter
SystemInfo:
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Mei?ner Workgroup 'Comparative Parliamentary Politics' Department of Politics and Administration University of Konstanz Box 216 78457 Konstanz Germany +49 7531 88 5665 http://www.polver.uni-konstanz.de/sieberer/home/
On 12-12-20 6:26 PM, Peter Meissner wrote:
Hey,
I have an double loop like this:
chunk <- list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher <- NULL
for(i in chunk[k]){
print("i load something")
dummy <- 1
print("i do something")
dummy <- dummy + 1
print("i do put it together")
DummyCatcher = rbind(DummyCatcher, dummy)
}
print("i save a chunk and restart with another chunk of data")
}
The problem now is that with each 'chunk'-cycle the memory used by R
becomes bigger and bigger until it exceeds my RAM but the RAM it needs
for any of the chunk-cycles alone is only a 1/5th of what I have overall.
Does somebody have an idea why this behaviour might occur? Note that all
the objects (like 'DummyCatcher') are reused every cycle so that I would
assume that the RAM used should stay about the same after the first
'chunk' cycle.
You should pre-allocate your result matrix. By growing it a few rows at a time, R needs to do this: allocate it allocate a bigger one, copy the old one in delete the old one, leaving a small hole in memory allocate a bigger one, copy the old one in delete the old one, leaving a bigger hold in memory, but still too small to use... etc. If you are lucky, R might be able to combine some of those small holes into a bigger one and use that, but chances are other variables will have been created there in the meantime, so the holes will go mostly unused. R never moves an object during garbage collection, so if you have fragmented memory, it's mostly wasted. If you don't know how big the final result will be, then allocate large, and when you run out, allocate bigger. Not as good as one allocation, but better than hundreds. Duncan Murdoch
Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I'll consider it. But in fact the whole data does not fit into memory at once with the overhead to create it in addition - I think. That was one of the reasons I wanted to do it chunk by chunk in the first place. Thanks, Best, Peter Am 21.12.2012 15:07, schrieb Duncan Murdoch:
On 12-12-20 6:26 PM, Peter Meissner wrote:
Hey,
I have an double loop like this:
chunk <- list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher <- NULL
for(i in chunk[k]){
print("i load something")
dummy <- 1
print("i do something")
dummy <- dummy + 1
print("i do put it together")
DummyCatcher = rbind(DummyCatcher, dummy)
}
print("i save a chunk and restart with another chunk of data")
}
The problem now is that with each 'chunk'-cycle the memory used by R
becomes bigger and bigger until it exceeds my RAM but the RAM it needs
for any of the chunk-cycles alone is only a 1/5th of what I have overall.
Does somebody have an idea why this behaviour might occur? Note that all
the objects (like 'DummyCatcher') are reused every cycle so that I would
assume that the RAM used should stay about the same after the first
'chunk' cycle.
You should pre-allocate your result matrix. By growing it a few rows at a time, R needs to do this: allocate it allocate a bigger one, copy the old one in delete the old one, leaving a small hole in memory allocate a bigger one, copy the old one in delete the old one, leaving a bigger hold in memory, but still too small to use... etc. If you are lucky, R might be able to combine some of those small holes into a bigger one and use that, but chances are other variables will have been created there in the meantime, so the holes will go mostly unused. R never moves an object during garbage collection, so if you have fragmented memory, it's mostly wasted. If you don't know how big the final result will be, then allocate large, and when you run out, allocate bigger. Not as good as one allocation, but better than hundreds. Duncan Murdoch
Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Mei?ner Workgroup 'Comparative Parliamentary Politics' Department of Politics and Administration University of Konstanz Box 216 78457 Konstanz Germany +49 7531 88 5665 http://www.polver.uni-konstanz.de/sieberer/home/