An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130806/4911a1f0/attachment.pl>
Rmpi long vector support
14 messages · Ei-ji Nakama, Xiaochun Sun, Hao Yu +3 more
Hi, <WARNING> It is not yet completed... </WARNING> http://prs.ism.ac.jp/~nakama/Rhpc/ 2013/8/7, Xiaochun Sun <xiaoch.sun at gmail.com>:
The same code worked fine for smaller dataset, such as 350x25000. Does that mean the current Rmpi don't allow long vectors to be passed to slaves? Any idea on that or any alternatives to Rmpi?
Its can treat the considerably big data.
EI-JI Nakama <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130807/a2f2b335/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130807/768a369e/attachment.pl>
Jim,
On Aug 7, 2013, at 11:59 AM, Jim Gattiker <j.gattiker at gmail.com> wrote:
To bcast.Robj, Rmpi uses serialize() to pack the object into "raw", which operates as a vector of bytes. R supports vectors up to 2^31 elements.
that is not true. R supports vectors up to 2^52 elements. That is way beyond current RAM sizes and certainly more than 2^31:
n=6e9 x=integer(n) a=serialize(x,NULL) length(a)
[1] 2.4e+10
log2(length(a))
[1] 34.48232 However Rmpi does not. You have to use XLENGTH and R_xlen_t in the C code if you want to go beyond 2^31. Cheers, Simon
Your
object, serialized, is close to that. I'm a little puzzled though,
350x350000 matrix works for me; perhaps there's more to your actual call.
Setting up the slave environment explicitly seems to me to be better
practice, i.e. using mpi.bcast.Robj2Slave and related calls, then the
applyLB() doesn't contain the data. As I'm reading it, I think the way this
is set up currently will send the data to a slave for each application of
the apply.
If you're facing larger data sizes, a solution is to explicitly cut up the
object, mpi.bcast.Robj2Slave the pieces in turn, and then use an
mpi.bcast.cmd to collect them together. Another approach would be to write
to data to a file, and direct the slaves to read it into their environments
with an mpi.bcast.cmd.
As a comment: I can't tell your application from the code, but if the
slaves are to be working each on only part of the data, it's better to send
the slave just the part of the data it needs.
cheers,
jim
On Tue, Aug 6, 2013 at 6:37 PM, Ei-ji Nakama <nakama at ki.rim.or.jp> wrote:
Hi, <WARNING> It is not yet completed... </WARNING> http://prs.ism.ac.jp/~nakama/Rhpc/ 2013/8/7, Xiaochun Sun <xiaoch.sun at gmail.com>:
The same code worked fine for smaller dataset, such as 350x25000. Does
that
mean the current Rmpi don't allow long vectors to be passed to slaves?
Any
idea on that or any alternatives to Rmpi?
Its can treat the considerably big data. -- EI-JI Nakama <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Hi Simon, Thank for pointing out this XLENGTH and R_xlen_t. I will add them in the next release of Rmpi. However I do run into the following issue. On a debian system with R 3.0.1 (8GB ram), I can run
n=6e8 x=integer(n) a=serialize(x,NULL) length(a)
[1] 2.4e+09 However, on a win7 64 with R 3.0.1 (16GB ram), I got
n=6e8 x=integer(n) a=serialize(x,NULL)
Error: long vectors not supported yet: ../include/Rinlinefuns.h:100 Is this a bug or xlenght is not implemented in R win version? Thanks, Hao
Simon Urbanek wrote:
Jim, On Aug 7, 2013, at 11:59 AM, Jim Gattiker <j.gattiker at gmail.com> wrote:
To bcast.Robj, Rmpi uses serialize() to pack the object into "raw", which operates as a vector of bytes. R supports vectors up to 2^31 elements.
that is not true. R supports vectors up to 2^52 elements. That is way beyond current RAM sizes and certainly more than 2^31:
n=6e9 x=integer(n) a=serialize(x,NULL) length(a)
[1] 2.4e+10
log2(length(a))
[1] 34.48232 However Rmpi does not. You have to use XLENGTH and R_xlen_t in the C code if you want to go beyond 2^31. Cheers, Simon
Your
object, serialized, is close to that. I'm a little puzzled though,
350x350000 matrix works for me; perhaps there's more to your actual
call.
Setting up the slave environment explicitly seems to me to be better
practice, i.e. using mpi.bcast.Robj2Slave and related calls, then the
applyLB() doesn't contain the data. As I'm reading it, I think the way
this
is set up currently will send the data to a slave for each application
of
the apply.
If you're facing larger data sizes, a solution is to explicitly cut up
the
object, mpi.bcast.Robj2Slave the pieces in turn, and then use an
mpi.bcast.cmd to collect them together. Another approach would be to
write
to data to a file, and direct the slaves to read it into their
environments
with an mpi.bcast.cmd.
As a comment: I can't tell your application from the code, but if the
slaves are to be working each on only part of the data, it's better to
send
the slave just the part of the data it needs.
cheers,
jim
On Tue, Aug 6, 2013 at 6:37 PM, Ei-ji Nakama <nakama at ki.rim.or.jp>
wrote:
Hi, <WARNING> It is not yet completed... </WARNING> http://prs.ism.ac.jp/~nakama/Rhpc/ 2013/8/7, Xiaochun Sun <xiaoch.sun at gmail.com>:
The same code worked fine for smaller dataset, such as 350x25000. Does
that
mean the current Rmpi don't allow long vectors to be passed to slaves?
Any
idea on that or any alternatives to Rmpi?
Its can treat the considerably big data. -- EI-JI Nakama <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Department of Statistics & Actuarial Sciences Office Phone#:(519)-661-3622 Fax Phone#:(519)-661-3813 The University of Western Ontario London, Ontario N6A 5B7 http://www.stats.uwo.ca/yu
To all, Look like the max buffer size in any MPI transfer is restricted to 2^31-1. Whether XLENGTH is used or not will not change it. This is the restriction of MPI-2. MPI-3 will address some of those issues. Until then we either split the data in R before transferring or define a user-defined type (e.g. a contiguous type like 10 double as one unit so we can transfer max 10*(2^31-1) elements of double). Hao
Simon Urbanek wrote:
Jim, On Aug 7, 2013, at 11:59 AM, Jim Gattiker <j.gattiker at gmail.com> wrote:
To bcast.Robj, Rmpi uses serialize() to pack the object into "raw", which operates as a vector of bytes. R supports vectors up to 2^31 elements.
that is not true. R supports vectors up to 2^52 elements. That is way beyond current RAM sizes and certainly more than 2^31:
n=6e9 x=integer(n) a=serialize(x,NULL) length(a)
[1] 2.4e+10
log2(length(a))
[1] 34.48232 However Rmpi does not. You have to use XLENGTH and R_xlen_t in the C code if you want to go beyond 2^31. Cheers, Simon
Your
object, serialized, is close to that. I'm a little puzzled though,
350x350000 matrix works for me; perhaps there's more to your actual
call.
Setting up the slave environment explicitly seems to me to be better
practice, i.e. using mpi.bcast.Robj2Slave and related calls, then the
applyLB() doesn't contain the data. As I'm reading it, I think the way
this
is set up currently will send the data to a slave for each application
of
the apply.
If you're facing larger data sizes, a solution is to explicitly cut up
the
object, mpi.bcast.Robj2Slave the pieces in turn, and then use an
mpi.bcast.cmd to collect them together. Another approach would be to
write
to data to a file, and direct the slaves to read it into their
environments
with an mpi.bcast.cmd.
As a comment: I can't tell your application from the code, but if the
slaves are to be working each on only part of the data, it's better to
send
the slave just the part of the data it needs.
cheers,
jim
On Tue, Aug 6, 2013 at 6:37 PM, Ei-ji Nakama <nakama at ki.rim.or.jp>
wrote:
Hi, <WARNING> It is not yet completed... </WARNING> http://prs.ism.ac.jp/~nakama/Rhpc/ 2013/8/7, Xiaochun Sun <xiaoch.sun at gmail.com>:
The same code worked fine for smaller dataset, such as 350x25000. Does
that
mean the current Rmpi don't allow long vectors to be passed to slaves?
Any
idea on that or any alternatives to Rmpi?
Its can treat the considerably big data. -- EI-JI Nakama <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Department of Statistics & Actuarial Sciences Office Phone#:(519)-661-3622 Fax Phone#:(519)-661-3813 The University of Western Ontario London, Ontario N6A 5B7 http://www.stats.uwo.ca/yu
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130808/05420315/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130808/c46120a8/attachment.pl>
On Aug 8, 2013, at 10:58 AM, Jim Gattiker <j.gattiker at gmail.com> wrote:
Simon, I don't understand this either: It seems that R is commonly indexed by 32-bit pointers, and mine certainly is on my Mac.
Nope, since R 3.0.0 we use 64-bit binaries on Mac OS X: $ R R version 3.0.1 Patched (2013-07-15 r63328) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin10.8.0 (64-bit) [...]
.Machine$sizeof.pointer
[1] 8
I'll get the message above also, as did the OP; that was the direct problem. How do I get "52 bit" R?
http://cran.r-project.org/bin/macosx/ Cheers, Simon
--j On Wed, Aug 7, 2013 at 7:42 PM, Hao Yu <hyu at stats.uwo.ca> wrote:
.... However, on a win7 64 with R 3.0.1 (16GB ram), I got
n=6e8 x=integer(n) a=serialize(x,NULL)
Error: long vectors not supported yet: ../include/Rinlinefuns.h:100
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
On Aug 7, 2013, at 9:42 PM, Hao Yu <hyu at stats.uwo.ca> wrote:
Hi Simon, Thank for pointing out this XLENGTH and R_xlen_t. I will add them in the next release of Rmpi. However I do run into the following issue. On a debian system with R 3.0.1 (8GB ram), I can run
n=6e8 x=integer(n) a=serialize(x,NULL) length(a)
[1] 2.4e+09 However, on a win7 64 with R 3.0.1 (16GB ram), I got
n=6e8 x=integer(n) a=serialize(x,NULL)
Error: long vectors not supported yet: ../include/Rinlinefuns.h:100 Is this a bug or xlenght is not implemented in R win version?
I'm currently traveling so I cannot check. It's a good question - maybe try R-devel? Note that long int on Windows is only 32-bit so I wonder if it is related ... Cheers, Simon
Thanks, Hao Simon Urbanek wrote:
Jim, On Aug 7, 2013, at 11:59 AM, Jim Gattiker <j.gattiker at gmail.com> wrote:
To bcast.Robj, Rmpi uses serialize() to pack the object into "raw", which operates as a vector of bytes. R supports vectors up to 2^31 elements.
that is not true. R supports vectors up to 2^52 elements. That is way beyond current RAM sizes and certainly more than 2^31:
n=6e9 x=integer(n) a=serialize(x,NULL) length(a)
[1] 2.4e+10
log2(length(a))
[1] 34.48232 However Rmpi does not. You have to use XLENGTH and R_xlen_t in the C code if you want to go beyond 2^31. Cheers, Simon
Your
object, serialized, is close to that. I'm a little puzzled though,
350x350000 matrix works for me; perhaps there's more to your actual
call.
Setting up the slave environment explicitly seems to me to be better
practice, i.e. using mpi.bcast.Robj2Slave and related calls, then the
applyLB() doesn't contain the data. As I'm reading it, I think the way
this
is set up currently will send the data to a slave for each application
of
the apply.
If you're facing larger data sizes, a solution is to explicitly cut up
the
object, mpi.bcast.Robj2Slave the pieces in turn, and then use an
mpi.bcast.cmd to collect them together. Another approach would be to
write
to data to a file, and direct the slaves to read it into their
environments
with an mpi.bcast.cmd.
As a comment: I can't tell your application from the code, but if the
slaves are to be working each on only part of the data, it's better to
send
the slave just the part of the data it needs.
cheers,
jim
On Tue, Aug 6, 2013 at 6:37 PM, Ei-ji Nakama <nakama at ki.rim.or.jp>
wrote:
Hi, <WARNING> It is not yet completed... </WARNING> http://prs.ism.ac.jp/~nakama/Rhpc/ 2013/8/7, Xiaochun Sun <xiaoch.sun at gmail.com>:
The same code worked fine for smaller dataset, such as 350x25000. Does
that
mean the current Rmpi don't allow long vectors to be passed to slaves?
Any
idea on that or any alternatives to Rmpi?
Its can treat the considerably big data. -- EI-JI Nakama <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- Department of Statistics & Actuarial Sciences Office Phone#:(519)-661-3622 Fax Phone#:(519)-661-3813 The University of Western Ontario London, Ontario N6A 5B7 http://www.stats.uwo.ca/yu
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130808/b38d301d/attachment.pl>
On Aug 8, 2013, at 1:44 PM, Jim Gattiker <j.gattiker at gmail.com> wrote:
Aha, thanks again. Two notes: 1) serialize() didn't work over 2^31 until 3.0.1 (so it failed in my 3.0.0, with a message that led me to believe that there was still a 32-bit limit on length).
Ok, thanks, I was checking with 3.0.1 so I didn't notice.
2) more puzzling: Using vectors over 2^31 (2^31+2 ?!) seems to be broken in Rstudio, even though it works in R command-line. It looks like the functions that monitor the workspace are not ready yet.
Anyway, the important thing is, this message:
"Error: long vectors not supported yet: ../include/Rinlinefuns.h:100"
is from Rstudio, not R.
That's possible - Rstudio overrides R functions which has caused problems before. If in doubt, test with R. Cheers, Simon
--j On Thu, Aug 8, 2013 at 10:38 AM, Simon Urbanek <simon.urbanek at r-project.org> wrote: On Aug 8, 2013, at 10:58 AM, Jim Gattiker <j.gattiker at gmail.com> wrote:
Simon, I don't understand this either: It seems that R is commonly indexed by 32-bit pointers, and mine certainly is on my Mac.
Nope, since R 3.0.0 we use 64-bit binaries on Mac OS X: $ R R version 3.0.1 Patched (2013-07-15 r63328) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin10.8.0 (64-bit) [...]
.Machine$sizeof.pointer
[1] 8
I'll get the message above also, as did the OP; that was the direct problem. How do I get "52 bit" R?
http://cran.r-project.org/bin/macosx/ Cheers, Simon
--j On Wed, Aug 7, 2013 at 7:42 PM, Hao Yu <hyu at stats.uwo.ca> wrote:
.... However, on a win7 64 with R 3.0.1 (16GB ram), I got
n=6e8 x=integer(n) a=serialize(x,NULL)
Error: long vectors not supported yet: ../include/Rinlinefuns.h:100
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130809/14908227/attachment.pl>