An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101026/aa124775/attachment.pl>
runtime on ising model
9 messages · Michael D, Jim Lemon, Mike Marchywka +2 more
On 10/26/2010 04:50 PM, Michael D wrote:
So I'm in a stochastic simulations class and I having issues with the amount of time it takes to run the Ising model. I usually don't like to attach the code I'm running, since it will probably make me look like a fool, but I figure its the best way I can find any bits I can speed up run time. As for the goals of the exercise: I need the state of the system at time=1, 10k, 100k, 1mill, and 10mill and the percentage of vertices with positive spin at all t Just to be clear, i'm not expecting anyone to tell me how to program this model, cause I know what I have works for this exercise, but it takes far too long to run and I'd like to speed it up by replacing slow operations wherever possible.
Hi Michael, One bottleneck is probably the sampling. If it doesn't grab too much memory, setting up a vector of the samples (maybe a million at a time if 10 million is too big - might be able to rewrite your sample vector when you store the state) and using k (and an offset if you don't have one big vector) to index it will give you some speed. Jim
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101026/a339ab14/attachment.pl>
----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400 From: mike409 at gmail.com To: jim at bitwrit.com.au CC: r-help at r-project.org Subject: Re: [R] runtime on ising model I have an update on where the issue is coming from. I commented out the code for "pos[k+1] <- M[i,j]" and the if statement for time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran fast(er). Next I added back in the "pos" statements and still runtimes were good (around 20 minutes). So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics decades ago, something about 8-tracks and cassettes, but you have to be careful with conclusions like " I removed foo and problem went away therefore problem was foo." Performance issues are often caused by memory, not CPU limitations. Removing anything with a big memory footprint could speed things up. IO can be a real bottleneck. If you are talking about things on minute timescales, look at task manager and see if you are even CPU limited. Look for page faults or IO etc. If you really need performance and have a task which is relatively simple, don't ignore c++ as a way to generate data points and then import these into R for analysis. In short, just because you are focusing on math it doesn't mean the computer is limited by that.
## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}
Would there be any reason R is executing the statements inside the "if"
before getting to the logical check?
Maybe R is written to hope for the best outcome (TRUE) and will just throw
out its work if the logic comes up FALSE?
I guess I can always break the for loop up into four parts and store the
state at the end of each, but thats an unsatisfying solution to me.
Jim, I like the suggestion of just pulling one big sample, but since I can
get the runtimes under 30 minutes just by removing the storage piece I doubt
I would see any noticeable changes by pulling large sample vectors.
Thanks,
Michael
On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote:
On 10/26/2010 04:50 PM, Michael D wrote:
So I'm in a stochastic simulations class and I having issues with the amount of time it takes to run the Ising model. I usually don't like to attach the code I'm running, since it will probably make me look like a fool, but I figure its the best way I can find any bits I can speed up run time. As for the goals of the exercise: I need the state of the system at time=1, 10k, 100k, 1mill, and 10mill and the percentage of vertices with positive spin at all t Just to be clear, i'm not expecting anyone to tell me how to program this model, cause I know what I have works for this exercise, but it takes far too long to run and I'd like to speed it up by replacing slow operations wherever possible. Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too much memory, setting up a vector of the samples (maybe a million at a time if 10 million is too big - might be able to rewrite your sample vector when you store the state) and using k (and an offset if you don't have one big vector) to index it will give you some speed. Jim
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101028/8cd76c6c/attachment.pl>
On Oct 28, 2010, at 11:52 AM, Michael D wrote:
Mike, I'm not sure what you mean about removing foo but I think the
method
is sound in diagnosing a program issue and the results speak for
themselves.
I did invert my if statement at the suggestion of a CS professor
(who also
suggested recoding in C, but I'm in an applied math program and
haven't had
the time to take programming courses, which i know would be helpful)
Anyway, with the statement as:
if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}
run times were back to around 20 minutes.
Have you tried replacing all of those 10^x operations with their integer equivalents, c(10000L, 100000L, 1000000L)? Each time through the loop you are unnecessarily calling the "^" function 4 times. You could also omit the last one. 10^7, during testing since M at the last iteration (k=10^7) would be the final value and you could just assign the state of M at the end. So we have eliminated 4*10^7 unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS professor is perhaps used to having the C compiler do all thinking of this sort for him.)
David
> So as best I can tell something
> happens in the if statement causing the computer to work ahead, as the
> professor suggests. I'm no expert on R (and have no desire to try
> looking at
> the R source code (it would only confuse me)) but if anyone can offer
> guidance on how the if statement works (Does R try to work ahead?
> Under what
> conditions does it try to "work ahead" so I can try to exploit this
> behavior) I would greatly appreciate it.
> If it would require too much knowledge of the computer system to
> understand
> I doubt I would be able to make use of it, but maybe someone else
> could
> benefit.
>
> On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka
> <marchywka at hotmail.com>wrote:
>
>> ----------------------------------------
>>> Date: Tue, 26 Oct 2010 12:53:14 -0400
>>> From: mike409 at gmail.com
>>> To: jim at bitwrit.com.au
>>> CC: r-help at r-project.org
>>> Subject: Re: [R] runtime on ising model
>>>
>>> I have an update on where the issue is coming from.
>>>
>>> I commented out the code for "pos[k+1] <- M[i,j]" and the if
>>> statement
>> for
>>> time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
>> fast(er).
>>> Next I added back in the "pos" statements and still runtimes were
>>> good
>>> (around 20 minutes).
>>>
>>> So I'm left with something is causing problems in:
>>
>> I haven't looked at this since some passing interest in magnetics
>> decades ago, something about 8-tracks and cassettes, but you have
>> to be careful with conclusions like " I removed foo and problem
>> went away therefore problem was foo." Performance issues are often
>> caused by memory, not CPU limitations. Removing anything with a big
>> memory footprint could speed things up. IO can be a real bottleneck.
>> If you are talking about things on minute timescales, look at task
>> manager and see if you are even CPU limited. Look for page faults
>> or IO etc. If you really need performance and have a task which
>> is relatively simple, don't ignore c++ as a way to generate data
>> points and then import these into R for analysis.
>>
>> In short, just because you are focusing on math it doesn't mean
>> the computer is limited by that.
>>
>>
>>>
>>> ## Store state at time 10^4, 10^5, 10^6, 10^7
>>> if( k %in% c(10^4,10^5,10^6,10^7) ){
>>> q <- q+1
>>> Out[[q]] <- M
>>> }
>>>
>>> Would there be any reason R is executing the statements inside the
>>> "if"
>>> before getting to the logical check?
>>> Maybe R is written to hope for the best outcome (TRUE) and will just
>> throw
>>> out its work if the logic comes up FALSE?
>>> I guess I can always break the for loop up into four parts and
>>> store the
>>> state at the end of each, but thats an unsatisfying solution to me.
>>>
>>>
>>> Jim, I like the suggestion of just pulling one big sample, but
>>> since I
>> can
>>> get the runtimes under 30 minutes just by removing the storage
>>> piece I
>> doubt
>>> I would see any noticeable changes by pulling large sample vectors.
>>>
>>> Thanks,
>>> Michael
>>>
>>> On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote:
>>>
>>>> On 10/26/2010 04:50 PM, Michael D wrote:
>>>>
>>>>> So I'm in a stochastic simulations class and I having issues
>>>>> with the
>>>>> amount
>>>>> of time it takes to run the Ising model.
>>>>>
>>>>> I usually don't like to attach the code I'm running, since it will
>>>>> probably
>>>>> make me look like a fool, but I figure its the best way I can
>>>>> find any
>>>>> bits
>>>>> I can speed up run time.
>>>>>
>>>>> As for the goals of the exercise:
>>>>> I need the state of the system at time=1, 10k, 100k, 1mill, and
>>>>> 10mill
>>>>> and the percentage of vertices with positive spin at all t
>>>>>
>>>>> Just to be clear, i'm not expecting anyone to tell me how to
>>>>> program
>> this
>>>>> model, cause I know what I have works for this exercise, but it
>>>>> takes
>> far
>>>>> too long to run and I'd like to speed it up by replacing slow
>> operations
>>>>> wherever possible.
>>>>>
>>>>> Hi Michael,
>>>> One bottleneck is probably the sampling. If it doesn't grab too
>>>> much
>>>> memory, setting up a vector of the samples (maybe a million at a
>>>> time
>> if 10
>>>> million is too big - might be able to rewrite your sample vector
>>>> when
>> you
>>>> store the state) and using k (and an offset if you don't have one
>>>> big
>>>> vector) to index it will give you some speed.
>>>>
>>>> Jim
>>>>
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: Thursday, October 28, 2010 9:20 AM To: Michael D Cc: r-help at r-project.org Subject: Re: [R] runtime on ising model On Oct 28, 2010, at 11:52 AM, Michael D wrote:
Mike, I'm not sure what you mean about removing foo but I
think the
method
is sound in diagnosing a program issue and the results speak for
themselves.
I did invert my if statement at the suggestion of a CS professor
(who also
suggested recoding in C, but I'm in an applied math program and
haven't had
the time to take programming courses, which i know would be helpful)
Anyway, with the statement as:
if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}
run times were back to around 20 minutes.
Did that one change really make a difference? R does not evaluate anything in the if or else clauses of an if statement before evaluating the condition.
Have you tried replacing all of those 10^x operations with their integer equivalents, c(10000L, 100000L, 1000000L)? Each time through the loop you are unnecessarily calling the "^" function 4 times. You could also omit the last one. 10^7, during testing since M at the last iteration (k=10^7) would be the final value and you could just assign the state of M at the end. So we have eliminated 4*10^7 unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS professor is perhaps used to having the C compiler do all thinking of this sort for him.)
%in% is a relatively expensive function. Use == if you can. E.g., compare the following 2 ways of stashing something at times 1e4, 1e5, and 1e6:
system.time({z <- integer()
for(k in seq_len(1e6))
if(k %in% set) z[length(z)+1]<-k
print(z)})
[1] 10000 100000 1000000
user system elapsed
46.790 0.023 46.844
system.time({z <- integer()
nextCheckPoint <- 10^4
for(k in seq_len(1e6))
if( k == nextCheckPoint ) {
nextCheckPoint <- nextCheckPoint * 10
z[length(z)+1]<-k
}
print(z)})
[1] 10000 100000 1000000
user system elapsed
4.529 0.013 4.545
With such a large number of iterations it pays to
remove unneeded function calls in arithmetic expressions.
R does not optimize them out - it is up to you to
do that. E.g.,
> system.time(for(i in seq_len(1e6)) sign(pi)*(-1))
user system elapsed
6.802 0.014 6.818
> system.time(for(i in seq_len(1e6)) -sign(pi))
user system elapsed
3.896 0.011 3.911
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-- David
So as best I can tell something happens in the if statement causing the computer to work
ahead, as the
professor suggests. I'm no expert on R (and have no desire to try looking at the R source code (it would only confuse me)) but if anyone
can offer
guidance on how the if statement works (Does R try to work ahead? Under what conditions does it try to "work ahead" so I can try to exploit this behavior) I would greatly appreciate it. If it would require too much knowledge of the computer system to understand I doubt I would be able to make use of it, but maybe someone else could benefit. On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka <marchywka at hotmail.com>wrote:
----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400 From: mike409 at gmail.com To: jim at bitwrit.com.au CC: r-help at r-project.org Subject: Re: [R] runtime on ising model I have an update on where the issue is coming from. I commented out the code for "pos[k+1] <- M[i,j]" and the if statement
for
time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
fast(er).
Next I added back in the "pos" statements and still
runtimes were
good (around 20 minutes). So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics decades ago, something about 8-tracks and cassettes, but you have to be careful with conclusions like " I removed foo and problem went away therefore problem was foo." Performance issues are often caused by memory, not CPU limitations. Removing anything with a big memory footprint could speed things up. IO can be a real
bottleneck.
If you are talking about things on minute timescales, look at task manager and see if you are even CPU limited. Look for page faults or IO etc. If you really need performance and have a task which is relatively simple, don't ignore c++ as a way to generate data points and then import these into R for analysis. In short, just because you are focusing on math it doesn't mean the computer is limited by that.
## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}
Would there be any reason R is executing the statements
inside the
"if" before getting to the logical check? Maybe R is written to hope for the best outcome (TRUE)
and will just
throw
out its work if the logic comes up FALSE? I guess I can always break the for loop up into four parts and store the state at the end of each, but thats an unsatisfying
solution to me.
Jim, I like the suggestion of just pulling one big sample, but since I
can
get the runtimes under 30 minutes just by removing the storage piece I
doubt
I would see any noticeable changes by pulling large
sample vectors.
Thanks, Michael On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote:
On 10/26/2010 04:50 PM, Michael D wrote:
So I'm in a stochastic simulations class and I having issues with the amount of time it takes to run the Ising model. I usually don't like to attach the code I'm running,
since it will
probably make me look like a fool, but I figure its the best way I can find any bits I can speed up run time. As for the goals of the exercise: I need the state of the system at time=1, 10k, 100k,
1mill, and
10mill and the percentage of vertices with positive spin at all t Just to be clear, i'm not expecting anyone to tell me how to program
this
model, cause I know what I have works for this
exercise, but it
takes
far
too long to run and I'd like to speed it up by replacing slow
operations
wherever possible. Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too much memory, setting up a vector of the samples (maybe a
million at a
time
if 10
million is too big - might be able to rewrite your
sample vector
when
you
store the state) and using k (and an offset if you don't
have one
big vector) to index it will give you some speed. Jim
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Oct 28, 2010, at 12:20 PM, David Winsemius wrote:
On Oct 28, 2010, at 11:52 AM, Michael D wrote:
Mike, I'm not sure what you mean about removing foo but I think the
method
is sound in diagnosing a program issue and the results speak for
themselves.
I did invert my if statement at the suggestion of a CS professor
(who also
suggested recoding in C, but I'm in an applied math program and
haven't had
the time to take programming courses, which i know would be helpful)
Anyway, with the statement as:
if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}
run times were back to around 20 minutes.
Have you tried replacing all of those 10^x operations with their integer equivalents, c(10000L, 100000L, 1000000L)? Each time through the loop you are unnecessarily calling the "^" function 4 times. You could also omit the last one. 10^7, during testing since M at the last iteration (k=10^7) would be the final value and you could just assign the state of M at the end. So we have eliminated 4*10^7 unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS professor is perhaps used to having the C compiler do all thinking of this sort for him.)
Bill Dunlap's suggestion to use "==" instead of %in% cut the time to
1/3 of what it had been even after the pre-calculation of the integer
values( which only improved the looping times by 30%). The combination
of the two with:
if (k ==10000L|k==100000L|k==1000000L ) { ... }
... resulted in an improvement by a factor or 12.006/2.523 or 475% for
the interim checking and printing operation using Bill's test suite.
-- David
So as best I can tell something happens in the if statement causing the computer to work ahead, as the professor suggests. I'm no expert on R (and have no desire to try looking at the R source code (it would only confuse me)) but if anyone can offer guidance on how the if statement works (Does R try to work ahead? Under what conditions does it try to "work ahead" so I can try to exploit this behavior) I would greatly appreciate it. If it would require too much knowledge of the computer system to understand I doubt I would be able to make use of it, but maybe someone else could benefit. On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka <marchywka at hotmail.com
wrote:
----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400 From: mike409 at gmail.com To: jim at bitwrit.com.au CC: r-help at r-project.org Subject: Re: [R] runtime on ising model I have an update on where the issue is coming from. I commented out the code for "pos[k+1] <- M[i,j]" and the if statement
for
time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
fast(er).
Next I added back in the "pos" statements and still runtimes were good (around 20 minutes). So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics decades ago, something about 8-tracks and cassettes, but you have to be careful with conclusions like " I removed foo and problem went away therefore problem was foo." Performance issues are often caused by memory, not CPU limitations. Removing anything with a big memory footprint could speed things up. IO can be a real bottleneck. If you are talking about things on minute timescales, look at task manager and see if you are even CPU limited. Look for page faults or IO etc. If you really need performance and have a task which is relatively simple, don't ignore c++ as a way to generate data points and then import these into R for analysis. In short, just because you are focusing on math it doesn't mean the computer is limited by that.
## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}
Would there be any reason R is executing the statements inside
the "if"
before getting to the logical check?
Maybe R is written to hope for the best outcome (TRUE) and will
just
throw
out its work if the logic comes up FALSE? I guess I can always break the for loop up into four parts and store the state at the end of each, but thats an unsatisfying solution to me. Jim, I like the suggestion of just pulling one big sample, but since I
can
get the runtimes under 30 minutes just by removing the storage piece I
doubt
I would see any noticeable changes by pulling large sample vectors. Thanks, Michael On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote:
On 10/26/2010 04:50 PM, Michael D wrote:
So I'm in a stochastic simulations class and I having issues with the amount of time it takes to run the Ising model. I usually don't like to attach the code I'm running, since it will probably make me look like a fool, but I figure its the best way I can find any bits I can speed up run time. As for the goals of the exercise: I need the state of the system at time=1, 10k, 100k, 1mill, and 10mill and the percentage of vertices with positive spin at all t Just to be clear, i'm not expecting anyone to tell me how to program
this
model, cause I know what I have works for this exercise, but it takes
far
too long to run and I'd like to speed it up by replacing slow
operations
wherever possible. Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too much memory, setting up a vector of the samples (maybe a million at a time
if 10
million is too big - might be able to rewrite your sample vector when
you
store the state) and using k (and an offset if you don't have one big vector) to index it will give you some speed. Jim
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
----------------------------------------
Date: Thu, 28 Oct 2010 09:58:40 -0700 From: wdunlap at tibco.com To: dwinsemius at comcast.net; mike409 at gmail.com CC: r-help at r-project.org Subject: Re: [R] runtime on ising model
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: Thursday, October 28, 2010 9:20 AM To: Michael D Cc: r-help at r-project.org Subject: Re: [R] runtime on ising model On Oct 28, 2010, at 11:52 AM, Michael D wrote:
Mike, I'm not sure what you mean about removing foo but I
think the
method is sound in diagnosing a program issue and the results speak for themselves.
Agreed on first part but not second- empirical debugging rarely produces compelling results in isolation. As a collection of symptons fine but not conclusive- if you learn c++ you will find out about all kinds of things like memory corruption that never make sense :) Here, the big concern is issues with memory as you never determined to be CPU limited although based on others comments you likely are in any case.
I did invert my if statement at the suggestion of a CS professor
(who also
suggested recoding in C, but I'm in an applied math program and
haven't had
the time to take programming courses, which i know would be helpful)
Anyway, with the statement as:
if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}
run times were back to around 20 minutes.
Did that one change really make a difference? R does not evaluate anything in the if or else clauses of an if statement before evaluating the condition.
What is at issue here? That is, the OP claimed inverting polarity sped things up, suggesting that the branch mattered. AFAIK he never actually proved which branch was taken. This could imply many things or nothing: one branch may be slow, or cause a page fault, or the test may fail fast but succed slowly( testing huge array for equality for example) .
Have you tried replacing all of those 10^x operations with their integer equivalents, c(10000L, 100000L, 1000000L)? Each time through the loop you are unnecessarily calling the "^" function 4 times. You could also omit the last one. 10^7, during testing since M at the last iteration (k=10^7) would be the final value and you could just assign the state of M at the end. So we have eliminated 4*10^7 unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS professor is perhaps used to having the C compiler do all thinking of this sort for him.)
%in% is a relatively expensive function. Use == if you can. E.g., compare the following 2 ways of stashing something at times 1e4, 1e5, and 1e6:
system.time({z <- integer()
for(k in seq_len(1e6)) if(k %in% set) z[length(z)+1]<-k print(z)}) [1] 10000 100000 1000000 user system elapsed 46.790 0.023 46.844
system.time({z <- integer()
nextCheckPoint <- 10^4
for(k in seq_len(1e6))
if( k == nextCheckPoint ) {
nextCheckPoint <- nextCheckPoint * 10
z[length(z)+1]<-k
}
print(z)})
[1] 10000 100000 1000000
user system elapsed
4.529 0.013 4.545
With such a large number of iterations it pays to
remove unneeded function calls in arithmetic expressions.
R does not optimize them out - it is up to you to
do that. E.g.,
system.time(for(i in seq_len(1e6)) sign(pi)*(-1))
user system elapsed 6.802 0.014 6.818
system.time(for(i in seq_len(1e6)) -sign(pi))
user system elapsed 3.896 0.011 3.911 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-- David
So as best I can tell something happens in the if statement causing the computer to work
ahead, as the
professor suggests. I'm no expert on R (and have no desire to try looking at the R source code (it would only confuse me)) but if anyone
can offer
guidance on how the if statement works (Does R try to work ahead? Under what conditions does it try to "work ahead" so I can try to exploit this behavior) I would greatly appreciate it. If it would require too much knowledge of the computer system to understand I doubt I would be able to make use of it, but maybe someone else could benefit. On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka wrote:
----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400 From: mike409 at gmail.com To: jim at bitwrit.com.au CC: r-help at r-project.org Subject: Re: [R] runtime on ising model I have an update on where the issue is coming from. I commented out the code for "pos[k+1] <- M[i,j]" and the if statement
for
time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
fast(er).
Next I added back in the "pos" statements and still
runtimes were
good (around 20 minutes). So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics decades ago, something about 8-tracks and cassettes, but you have to be careful with conclusions like " I removed foo and problem went away therefore problem was foo." Performance issues are often caused by memory, not CPU limitations. Removing anything with a big memory footprint could speed things up. IO can be a real
bottleneck.
If you are talking about things on minute timescales, look at task manager and see if you are even CPU limited. Look for page faults or IO etc. If you really need performance and have a task which is relatively simple, don't ignore c++ as a way to generate data points and then import these into R for analysis. In short, just because you are focusing on math it doesn't mean the computer is limited by that.
## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}
Would there be any reason R is executing the statements
inside the
"if" before getting to the logical check? Maybe R is written to hope for the best outcome (TRUE)
and will just
throw
out its work if the logic comes up FALSE? I guess I can always break the for loop up into four parts and store the state at the end of each, but thats an unsatisfying
solution to me.
Jim, I like the suggestion of just pulling one big sample, but since I
can
get the runtimes under 30 minutes just by removing the storage piece I
doubt
I would see any noticeable changes by pulling large
sample vectors.
Thanks, Michael On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote:
On 10/26/2010 04:50 PM, Michael D wrote:
So I'm in a stochastic simulations class and I having issues with the amount of time it takes to run the Ising model. I usually don't like to attach the code I'm running,
since it will
probably make me look like a fool, but I figure its the best way I can find any bits I can speed up run time. As for the goals of the exercise: I need the state of the system at time=1, 10k, 100k,
1mill, and
10mill and the percentage of vertices with positive spin at all t Just to be clear, i'm not expecting anyone to tell me how to program
this
model, cause I know what I have works for this
exercise, but it
takes
far
too long to run and I'd like to speed it up by replacing slow
operations
wherever possible. Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too much memory, setting up a vector of the samples (maybe a
million at a
time
if 10
million is too big - might be able to rewrite your
sample vector
when
you
store the state) and using k (and an offset if you don't
have one
big vector) to index it will give you some speed. Jim
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.