Dear list,
I have noticed surprisingly big performance differences of runif()
between Windows XP and (Debian) linux on similar CPUs (Pentium D 3.0GHz
(WinXP)/3.2GHz (Linux)) and I wonder if there is a simple explanation
for the difference.
On a linux system (with a slightly better CPU and 1GB more RAM),
execution of runif() seems to consume about 80% more CPU time than on a
Windows XP system.
On a Xeon 2.7GHz (Debian) linux I have checked, that using the .deb -
i386 - Version of R instead of a self-build i686 - version has no
noticeable effect on speed.
Measuring CPU time with Rprof() instead of Sys.time()-differences yields
similar results.
Any hint is appreciated, please let me know, if the given information on
system/OS or the R output below is not sufficient.
Regards,
Martin Becker
------------------------ R - Output below ------------------------
Windows XP: (Pentium D, 3.0 GHz)
> version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 3.1
year 2006
month 06
day 01
svn rev 38247
language R
version.string Version 2.3.1 (2006-06-01)
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> t1<-Sys.time();for (i in 1:500) ttt<-runif(1000000);print(Sys.time()-t1);
Time difference of 57.969 secs
>
Debian Linux: (Pentium D, 3.2GHz)
> version
_
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 2
minor 3.1
year 2006
month 06
day 01
svn rev 38247
language R
version.string Version 2.3.1 (2006-06-01)
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> t1<-Sys.time();for (i in 1:500)
ttt<-runif(1000000);print(Sys.time()-t1);
Time difference of 1.752916 mins
>
Speed of runif() on different Operating Systems
8 messages · Brian Ripley, Martin Becker, Duncan Murdoch
1 day later
No one else seems to have responded to this. Please see `Writing R Extensions' for how to time things in R. For things like this, the fine details of how well the compiler keeps the pipelines and cache filled are important, as is the cache size and memory speed. Using system.time(for (i in 1:500) ttt <- runif(1000000)) your Linux time looks slow, indeed slower than the only 32-bit Linux box I have left (a 2GHz 512Kb cache Xeon) and 2.5x slower than a 64-bit R on an 2.2GHz Opteron (which is doing a lot of other work and so only giving about 30% of one of its processors to R: the elapsed time was much longer). The binary distribution of R for Windows is compiled with -O3: for some tasks it makes a lot of difference and this might just be one. However, what can you usefully do in R with 5e8 random uniforms in anything like a minute, let alone the aggregate time this list will spend reading your question? If it matters to you, investigate the code your compiler creates. (The ATLAS developers report very poor performance on certain Pentiums for certain versions of gcc4.)
On Mon, 28 Aug 2006, Martin Becker wrote:
Dear list, I have noticed surprisingly big performance differences of runif() between Windows XP and (Debian) linux on similar CPUs (Pentium D 3.0GHz (WinXP)/3.2GHz (Linux)) and I wonder if there is a simple explanation for the difference. On a linux system (with a slightly better CPU and 1GB more RAM), execution of runif() seems to consume about 80% more CPU time than on a Windows XP system. On a Xeon 2.7GHz (Debian) linux I have checked, that using the .deb - i386 - Version of R instead of a self-build i686 - version has no noticeable effect on speed. Measuring CPU time with Rprof() instead of Sys.time()-differences yields similar results.
You are not measuring CPU time at all with Sys.time.
Any hint is appreciated, please let me know, if the given information on system/OS or the R output below is not sufficient. Regards, Martin Becker ------------------------ R - Output below ------------------------ Windows XP: (Pentium D, 3.0 GHz)
> version
_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 3.1 year 2006 month 06 day 01 svn rev 38247 language R version.string Version 2.3.1 (2006-06-01)
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> t1<-Sys.time();for (i in 1:500) ttt<-runif(1000000);print(Sys.time()-t1);
Time difference of 57.969 secs
>
Debian Linux: (Pentium D, 3.2GHz)
> version
_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 3.1 year 2006 month 06 day 01 svn rev 38247 language R version.string Version 2.3.1 (2006-06-01)
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> t1<-Sys.time();for (i in 1:500)
ttt<-runif(1000000);print(Sys.time()-t1); Time difference of 1.752916 mins
>
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley wrote:
No one else seems to have responded to this. Please see `Writing R Extensions' for how to time things in R.
Thank you very much for the pointer to system.time(), although I read most of 'Writing R Extensions', I must have overlooked this (very useful) part. Nevertheless I was aware, that Sys.time() does not measure cpu time (that's why I included the information, that measuring time with Rprof() yields similar results, I had better included the output of Rprof indeed), but I was the only user on both (idle) dual core systems and thus expected a high correlation between the differences of Sys.time() and the cpu time actually used.
For things like this, the fine details of how well the compiler keeps the pipelines and cache filled are important, as is the cache size and memory speed. Using system.time(for (i in 1:500) ttt <- runif(1000000)) your Linux time looks slow, indeed slower than the only 32-bit Linux box I have left (a 2GHz 512Kb cache Xeon) and 2.5x slower than a 64-bit R on an 2.2GHz Opteron (which is doing a lot of other work and so only giving about 30% of one of its processors to R: the elapsed time was much longer). The binary distribution of R for Windows is compiled with -O3: for some tasks it makes a lot of difference and this might just be one.
Thank you very much for this valuable piece of information, it explains a big part of the speed difference: recompiling R on my linux box with -O3 significantly increases speed of runif(), now the linux box needs less than 40% more time than the WinXP system.
However, what can you usefully do in R with 5e8 random uniforms in anything like a minute, let alone the aggregate time this list will spend reading your question?
The standard method for simulating final, minimal and maximal values of Brownian Motion relies on a (discrete) n-step random walk approximation, where n has to be chosen very large (typically n=100 000) to keep the bias induced by the approximation "small enough" for certain applications. So if you want to do MC option pricing of e.g. double barrier options, 5e8 random uniforms are needed for 5 000 draws of final, minimal and maximal value, which is still a quite small number of draws in MC applications. I am working on a faster simulation method and of course I want to compare the speed of the new and (old) standard method, that's why I needed large numbers of random uniforms. I thought that the particular application is not of interest for this list, so I left it out in my initial submission. I apologise, if my submission was off-topic in this mailing list.
If it matters to you, investigate the code your compiler creates. (The ATLAS developers report very poor performance on certain Pentiums for certain versions of gcc4.)
Thank you again for the useful hints! Regards, Martin Becker
On Wed, 30 Aug 2006, Martin Becker wrote:
Prof Brian Ripley wrote:
No one else seems to have responded to this. Please see `Writing R Extensions' for how to time things in R.
Thank you very much for the pointer to system.time(), although I read most of 'Writing R Extensions', I must have overlooked this (very useful) part. Nevertheless I was aware, that Sys.time() does not measure cpu time (that's why I included the information, that measuring time with Rprof() yields similar results, I had better included the output of Rprof indeed), but I was the only user on both (idle) dual core systems and thus expected a high correlation between the differences of Sys.time() and the cpu time actually used.
Actually, Rprof does time elapsed time on Windows. Calling gc() first is important, and part of what system.time() does.
For things like this, the fine details of how well the compiler keeps the pipelines and cache filled are important, as is the cache size and memory speed. Using system.time(for (i in 1:500) ttt <- runif(1000000)) your Linux time looks slow, indeed slower than the only 32-bit Linux box I have left (a 2GHz 512Kb cache Xeon) and 2.5x slower than a 64-bit R on an 2.2GHz Opteron (which is doing a lot of other work and so only giving about 30% of one of its processors to R: the elapsed time was much longer). The binary distribution of R for Windows is compiled with -O3: for some tasks it makes a lot of difference and this might just be one.
Thank you very much for this valuable piece of information, it explains a big part of the speed difference: recompiling R on my linux box with -O3 significantly increases speed of runif(), now the linux box needs less than 40% more time than the WinXP system.
However, what can you usefully do in R with 5e8 random uniforms in anything like a minute, let alone the aggregate time this list will spend reading your question?
The standard method for simulating final, minimal and maximal values of Brownian Motion relies on a (discrete) n-step random walk approximation, where n has to be chosen very large (typically n=100 000) to keep the bias induced by the approximation "small enough" for certain applications. So if you want to do MC option pricing of e.g. double barrier options, 5e8 random uniforms are needed for 5 000 draws of final, minimal and maximal value, which is still a quite small number of draws in MC applications. I am working on a faster simulation method and of course I want to compare the speed of the new and (old) standard method, that's why I needed large numbers of random uniforms. I thought that the particular application is not of interest for this list, so I left it out in my initial submission. I apologise, if my submission was off-topic in this mailing list.
Isn't that usually done by adding rnorm()s and not runif()s? There are much better algorithms for simulating Brownian motion barrier-crossing statistics to high accuracy. It's not my field, but one idea is to use scaled Brownian bridge to infill time when the process is near a boundary. Sometimes the R helpers spend a long time answering the wrong question, which is why it always helps to give the real one.
If it matters to you, investigate the code your compiler creates. (The ATLAS developers report very poor performance on certain Pentiums for certain versions of gcc4.)
Thank you again for the useful hints! Regards, Martin Becker
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 8/30/2006 6:33 AM, Prof Brian Ripley wrote:
On Wed, 30 Aug 2006, Martin Becker wrote:
Prof Brian Ripley wrote:
No one else seems to have responded to this. Please see `Writing R Extensions' for how to time things in R.
Thank you very much for the pointer to system.time(), although I read most of 'Writing R Extensions', I must have overlooked this (very useful) part. Nevertheless I was aware, that Sys.time() does not measure cpu time (that's why I included the information, that measuring time with Rprof() yields similar results, I had better included the output of Rprof indeed), but I was the only user on both (idle) dual core systems and thus expected a high correlation between the differences of Sys.time() and the cpu time actually used.
Actually, Rprof does time elapsed time on Windows. Calling gc() first is important, and part of what system.time() does.
For things like this, the fine details of how well the compiler keeps the pipelines and cache filled are important, as is the cache size and memory speed. Using system.time(for (i in 1:500) ttt <- runif(1000000)) your Linux time looks slow, indeed slower than the only 32-bit Linux box I have left (a 2GHz 512Kb cache Xeon) and 2.5x slower than a 64-bit R on an 2.2GHz Opteron (which is doing a lot of other work and so only giving about 30% of one of its processors to R: the elapsed time was much longer). The binary distribution of R for Windows is compiled with -O3: for some tasks it makes a lot of difference and this might just be one.
Thank you very much for this valuable piece of information, it explains a big part of the speed difference: recompiling R on my linux box with -O3 significantly increases speed of runif(), now the linux box needs less than 40% more time than the WinXP system.
However, what can you usefully do in R with 5e8 random uniforms in anything like a minute, let alone the aggregate time this list will spend reading your question?
The standard method for simulating final, minimal and maximal values of Brownian Motion relies on a (discrete) n-step random walk approximation, where n has to be chosen very large (typically n=100 000) to keep the bias induced by the approximation "small enough" for certain applications. So if you want to do MC option pricing of e.g. double barrier options, 5e8 random uniforms are needed for 5 000 draws of final, minimal and maximal value, which is still a quite small number of draws in MC applications. I am working on a faster simulation method and of course I want to compare the speed of the new and (old) standard method, that's why I needed large numbers of random uniforms. I thought that the particular application is not of interest for this list, so I left it out in my initial submission. I apologise, if my submission was off-topic in this mailing list.
Isn't that usually done by adding rnorm()s and not runif()s? There are much better algorithms for simulating Brownian motion barrier-crossing statistics to high accuracy. It's not my field, but one idea is to use scaled Brownian bridge to infill time when the process is near a boundary.
McLeish published algorithms to simulate these directly in a recent issue of CJS. I don't have the reference handy, but I think it's 2004 or 2005. Duncan Murdoch
Sometimes the R helpers spend a long time answering the wrong question, which is why it always helps to give the real one.
If it matters to you, investigate the code your compiler creates. (The ATLAS developers report very poor performance on certain Pentiums for certain versions of gcc4.)
Thank you again for the useful hints! Regards, Martin Becker
Prof Brian Ripley schrieb:
The standard method for simulating final, minimal and maximal values of
Brownian Motion relies on a (discrete) n-step random walk approximation, where
n has to be chosen very large (typically n=100 000) to keep the bias induced
by the approximation "small enough" for certain applications. So if you want
to do MC option pricing of e.g. double barrier options, 5e8 random uniforms
are needed for 5 000 draws of final, minimal and maximal value, which is still
a quite small number of draws in MC applications. I am working on a faster
simulation method and of course I want to compare the speed of the new and
(old) standard method, that's why I needed large numbers of random uniforms. I
thought that the particular application is not of interest for this list, so I
left it out in my initial submission. I apologise, if my submission was
off-topic in this mailing list.
Isn't that usually done by adding rnorm()s and not runif()s? There are much better algorithms for simulating Brownian motion barrier-crossing statistics to high accuracy. It's not my field, but one idea is to use scaled Brownian bridge to infill time when the process is near a boundary. Sometimes the R helpers spend a long time answering the wrong question, which is why it always helps to give the real one.
As I wrote, I am working on (an implementation of) a faster method, which indeed uses Brownian bridge and related concepts, and I generated the huge number of random uniforms (random normals are better but still slower) just for speed comparison. So for me, the real question was about the speed difference of runif(). Thanks again, regards, Martin
Duncan Murdoch schrieb:
McLeish published algorithms to simulate these directly in a recent issue of CJS. I don't have the reference handy, but I think it's 2004 or 2005. Duncan Murdoch
Thank you for this reference, I think it is the 2002 article "*Highs and lows: Some properties of the extremes of a diffusion and applications in finance"*. This article perfectly covers simulation of final and minimal or final and maximal value and gives some proposals for the simulation of the third component (max resp. min). In principle my implementation of the simulation of the first two components coincides with the algorithm given in this paper. Thanks again, Martin
On 8/30/2006 7:44 AM, Martin Becker wrote:
Duncan Murdoch schrieb:
McLeish published algorithms to simulate these directly in a recent issue of CJS. I don't have the reference handy, but I think it's 2004 or 2005. Duncan Murdoch
Thank you for this reference, I think it is the 2002 article "*Highs and lows: Some properties of the extremes of a diffusion and applications in finance"*.
Yes, that's it. Duncan Murdoch
This article perfectly covers simulation of final and minimal or final and maximal value and gives some proposals for the simulation of the third component (max resp. min). In principle my implementation of the simulation of the first two components coincides with the algorithm given in this paper. Thanks again, Martin