runtime on ising model

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101026/aa124775/attachment.pl>
So I'm in a stochastic simulations class and I having issues with the amount
of time it takes to run the Ising model.

I usually don't like to attach the code I'm running, since it will probably
make me look like a fool, but I figure its the best way I can find any bits
I can speed up run time.

As for the goals of the exercise:
I need the state of the system at time=1, 10k, 100k, 1mill, and 10mill
and the percentage of vertices with positive spin at all t

Just to be clear, i'm not expecting anyone to tell me how to program this
model, cause I know what I have works for this exercise, but it takes far
too long to run and I'd like to speed it up by replacing slow operations
wherever possible.

Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too much 
memory, setting up a vector of the samples (maybe a million at a time if 
10 million is too big - might be able to rewrite your sample vector when 
you store the state) and using k (and an offset if you don't have one 
big vector) to index it will give you some speed.

Jim
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101026/a339ab14/attachment.pl>
----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400
From: mike409 at gmail.com
To: jim at bitwrit.com.au
CC: r-help at r-project.org
Subject: Re: [R] runtime on ising model

I have an update on where the issue is coming from.

I commented out the code for "pos[k+1] <- M[i,j]" and the if statement for
time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran fast(er).
Next I added back in the "pos" statements and still runtimes were good
(around 20 minutes).

So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics
decades ago, something about 8-tracks and cassettes, but you have
to be careful with conclusions like " I removed foo and problem
went away therefore problem was foo." Performance issues are often
caused by memory, not CPU limitations. Removing anything with a big
memory footprint could speed things up. IO can be a real bottleneck.
If you are talking about things on minute timescales, look at task
manager and see if you are even CPU limited. Look for page faults
or IO etc. If you really need performance and have a task which
is relatively simple, don't ignore c++ as a way to generate data
points and then import these into R for analysis. 

In short, just because you are focusing on math it doesn't mean
the computer is limited by that.
## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}

Would there be any reason R is executing the statements inside the "if"
before getting to the logical check?
Maybe R is written to hope for the best outcome (TRUE) and will just throw
out its work if the logic comes up FALSE?
I guess I can always break the for loop up into four parts and store the
state at the end of each, but thats an unsatisfying solution to me.

Jim, I like the suggestion of just pulling one big sample, but since I can
get the runtimes under 30 minutes just by removing the storage piece I doubt
I would see any noticeable changes by pulling large sample vectors.

Thanks,
Michael

On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon  wrote:

On 10/26/2010 04:50 PM, Michael D wrote:

So I'm in a stochastic simulations class and I having issues with the
amount
of time it takes to run the Ising model.

I usually don't like to attach the code I'm running, since it will
probably
make me look like a fool, but I figure its the best way I can find any
bits
I can speed up run time.

As for the goals of the exercise:
I need the state of the system at time=1, 10k, 100k, 1mill, and 10mill
and the percentage of vertices with positive spin at all t

Just to be clear, i'm not expecting anyone to tell me how to program this
model, cause I know what I have works for this exercise, but it takes far
too long to run and I'd like to speed it up by replacing slow operations
wherever possible.

Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too much
memory, setting up a vector of the samples (maybe a million at a time if 10
million is too big - might be able to rewrite your sample vector when you
store the state) and using k (and an offset if you don't have one big
vector) to index it will give you some speed.

Jim

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101028/8cd76c6c/attachment.pl>

Mike, I'm not sure what you mean about removing foo but I think the  
method
is sound in diagnosing a program issue and the results speak for  
themselves.

I did invert my if statement at the suggestion of a CS professor  
(who also
suggested recoding in C, but I'm in an applied math program and  
haven't had
the time to take programming courses, which i know would be helpful)

Anyway, with the statement as:

if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}

run times were back to around 20 minutes.
Have you tried replacing all of those 10^x operations with their  
integer equivalents, c(10000L, 100000L, 1000000L)? Each time through  
the loop you are unnecessarily calling the "^" function 4 times. You  
could also omit the last one. 10^7,  during testing since M at the  
last iteration (k=10^7) would be the final value and you could just  
assign the state of M at the end. So we have eliminated 4*10^7  
unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS  
professor is perhaps used to having the C compiler do all thinking of  
this sort for him.)
David

> So as best I can tell something
> happens in the if statement causing the computer to work ahead, as the
> professor suggests. I'm no expert on R (and have no desire to try  
> looking at
> the R source code (it would only confuse me)) but if anyone can offer
> guidance on how the if statement works (Does R try to work ahead?  
> Under what
> conditions does it try to "work ahead" so I can try to exploit this
> behavior) I would greatly appreciate it.
> If it would require too much knowledge of the computer system to  
> understand
> I doubt I would be able to make use of it, but maybe someone else  
> could
> benefit.
>
> On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka  
> <marchywka at hotmail.com>wrote:
>
>> ----------------------------------------
>>> Date: Tue, 26 Oct 2010 12:53:14 -0400
>>> From: mike409 at gmail.com
>>> To: jim at bitwrit.com.au
>>> CC: r-help at r-project.org
>>> Subject: Re: [R] runtime on ising model
>>>
>>> I have an update on where the issue is coming from.
>>>
>>> I commented out the code for "pos[k+1] <- M[i,j]" and the if  
>>> statement
>> for
>>> time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
>> fast(er).
>>> Next I added back in the "pos" statements and still runtimes were  
>>> good
>>> (around 20 minutes).
>>>
>>> So I'm left with something is causing problems in:
>>
>> I haven't looked at this since some passing interest in magnetics
>> decades ago, something about 8-tracks and cassettes, but you have
>> to be careful with conclusions like " I removed foo and problem
>> went away therefore problem was foo." Performance issues are often
>> caused by memory, not CPU limitations. Removing anything with a big
>> memory footprint could speed things up. IO can be a real bottleneck.
>> If you are talking about things on minute timescales, look at task
>> manager and see if you are even CPU limited. Look for page faults
>> or IO etc. If you really need performance and have a task which
>> is relatively simple, don't ignore c++ as a way to generate data
>> points and then import these into R for analysis.
>>
>> In short, just because you are focusing on math it doesn't mean
>> the computer is limited by that.
>>
>>
>>>
>>> ## Store state at time 10^4, 10^5, 10^6, 10^7
>>> if( k %in% c(10^4,10^5,10^6,10^7) ){
>>> q <- q+1
>>> Out[[q]] <- M
>>> }
>>>
>>> Would there be any reason R is executing the statements inside the  
>>> "if"
>>> before getting to the logical check?
>>> Maybe R is written to hope for the best outcome (TRUE) and will just
>> throw
>>> out its work if the logic comes up FALSE?
>>> I guess I can always break the for loop up into four parts and  
>>> store the
>>> state at the end of each, but thats an unsatisfying solution to me.
>>>
>>>
>>> Jim, I like the suggestion of just pulling one big sample, but  
>>> since I
>> can
>>> get the runtimes under 30 minutes just by removing the storage  
>>> piece I
>> doubt
>>> I would see any noticeable changes by pulling large sample vectors.
>>>
>>> Thanks,
>>> Michael
>>>
>>> On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon  wrote:
>>>
>>>> On 10/26/2010 04:50 PM, Michael D wrote:
>>>>
>>>>> So I'm in a stochastic simulations class and I having issues  
>>>>> with the
>>>>> amount
>>>>> of time it takes to run the Ising model.
>>>>>
>>>>> I usually don't like to attach the code I'm running, since it will
>>>>> probably
>>>>> make me look like a fool, but I figure its the best way I can  
>>>>> find any
>>>>> bits
>>>>> I can speed up run time.
>>>>>
>>>>> As for the goals of the exercise:
>>>>> I need the state of the system at time=1, 10k, 100k, 1mill, and  
>>>>> 10mill
>>>>> and the percentage of vertices with positive spin at all t
>>>>>
>>>>> Just to be clear, i'm not expecting anyone to tell me how to  
>>>>> program
>> this
>>>>> model, cause I know what I have works for this exercise, but it  
>>>>> takes
>> far
>>>>> too long to run and I'd like to speed it up by replacing slow
>> operations
>>>>> wherever possible.
>>>>>
>>>>> Hi Michael,
>>>> One bottleneck is probably the sampling. If it doesn't grab too  
>>>> much
>>>> memory, setting up a vector of the samples (maybe a million at a  
>>>> time
>> if 10
>>>> million is too big - might be able to rewrite your sample vector  
>>>> when
>> you
>>>> store the state) and using k (and an offset if you don't have one  
>>>> big
>>>> vector) to index it will give you some speed.
>>>>
>>>> Jim
>>>>
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT
-----Original Message-----
From: r-help-bounces at r-project.org 
[mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
Sent: Thursday, October 28, 2010 9:20 AM
To: Michael D
Cc: r-help at r-project.org
Subject: Re: [R] runtime on ising model

On Oct 28, 2010, at 11:52 AM, Michael D wrote:

Mike, I'm not sure what you mean about removing foo but I 
think the  
method
is sound in diagnosing a program issue and the results speak for  
themselves.

I did invert my if statement at the suggestion of a CS professor  
(who also
suggested recoding in C, but I'm in an applied math program and  
haven't had
the time to take programming courses, which i know would be helpful)

Anyway, with the statement as:

if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}

run times were back to around 20 minutes.
Did that one change really make a difference?
R does not evaluate anything in the if or else
clauses of an if statement before evaluating
the condition.
Have you tried replacing all of those 10^x operations with their  
integer equivalents, c(10000L, 100000L, 1000000L)? Each time through  
the loop you are unnecessarily calling the "^" function 4 times. You  
could also omit the last one. 10^7,  during testing since M at the  
last iteration (k=10^7) would be the final value and you could just  
assign the state of M at the end. So we have eliminated 4*10^7  
unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS  
professor is perhaps used to having the C compiler do all 
thinking of  
this sort for him.)
%in% is a relatively expensive function.  Use == if you
can.  E.g., compare the following 2 ways of stashing
something at times 1e4, 1e5, and 1e6:
 system.time({z <- integer()
for(k in seq_len(1e6))
                   if(k %in% set) z[length(z)+1]<-k
                print(z)})
[1]   10000  100000 1000000
   user  system elapsed
 46.790   0.023  46.844
system.time({z <- integer()
nextCheckPoint <- 10^4
               for(k in seq_len(1e6))
                   if( k == nextCheckPoint ) {
                       nextCheckPoint <- nextCheckPoint * 10
                       z[length(z)+1]<-k
                   }
               print(z)})
[1]   10000  100000 1000000
   user  system elapsed
  4.529   0.013   4.545

With such a large number of iterations it pays to
remove unneeded function calls in arithmetic expressions.
R does not optimize them out - it is up to you to
do that.  E.g.,

  > system.time(for(i in seq_len(1e6)) sign(pi)*(-1))
     user  system elapsed
    6.802   0.014   6.818
  > system.time(for(i in seq_len(1e6)) -sign(pi))
     user  system elapsed
    3.896   0.011   3.911

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-- 
David

So as best I can tell something
happens in the if statement causing the computer to work 
ahead, as the
professor suggests. I'm no expert on R (and have no desire to try  
looking at
the R source code (it would only confuse me)) but if anyone 
can offer
guidance on how the if statement works (Does R try to work ahead?  
Under what
conditions does it try to "work ahead" so I can try to exploit this
behavior) I would greatly appreciate it.
If it would require too much knowledge of the computer system to  
understand
I doubt I would be able to make use of it, but maybe someone else  
could
benefit.

On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka  
<marchywka at hotmail.com>wrote:

----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400
From: mike409 at gmail.com
To: jim at bitwrit.com.au
CC: r-help at r-project.org
Subject: Re: [R] runtime on ising model

I have an update on where the issue is coming from.

I commented out the code for "pos[k+1] <- M[i,j]" and the if  
statement
for
time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
fast(er).
Next I added back in the "pos" statements and still 
runtimes were  
good
(around 20 minutes).

So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics
decades ago, something about 8-tracks and cassettes, but you have
to be careful with conclusions like " I removed foo and problem
went away therefore problem was foo." Performance issues are often
caused by memory, not CPU limitations. Removing anything with a big
memory footprint could speed things up. IO can be a real 
bottleneck.
If you are talking about things on minute timescales, look at task
manager and see if you are even CPU limited. Look for page faults
or IO etc. If you really need performance and have a task which
is relatively simple, don't ignore c++ as a way to generate data
points and then import these into R for analysis.

In short, just because you are focusing on math it doesn't mean
the computer is limited by that.

## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}

Would there be any reason R is executing the statements 
inside the  
"if"
before getting to the logical check?
Maybe R is written to hope for the best outcome (TRUE) 
and will just
throw
out its work if the logic comes up FALSE?
I guess I can always break the for loop up into four parts and  
store the
state at the end of each, but thats an unsatisfying 
solution to me.

Jim, I like the suggestion of just pulling one big sample, but  
since I
can
get the runtimes under 30 minutes just by removing the storage  
piece I
doubt
I would see any noticeable changes by pulling large 
sample vectors.
Thanks,
Michael

On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon  wrote:

On 10/26/2010 04:50 PM, Michael D wrote:

So I'm in a stochastic simulations class and I having issues  
with the
amount
of time it takes to run the Ising model.

I usually don't like to attach the code I'm running, 
since it will
probably
make me look like a fool, but I figure its the best way I can  
find any
bits
I can speed up run time.

As for the goals of the exercise:
I need the state of the system at time=1, 10k, 100k, 
1mill, and  
10mill
and the percentage of vertices with positive spin at all t

Just to be clear, i'm not expecting anyone to tell me how to  
program
this
model, cause I know what I have works for this 
exercise, but it  
takes
far
too long to run and I'd like to speed it up by replacing slow
operations
wherever possible.

Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too  
much
memory, setting up a vector of the samples (maybe a 
million at a  
time
if 10
million is too big - might be able to rewrite your 
sample vector  
when
you
store the state) and using k (and an offset if you don't 
have one  
big
vector) to index it will give you some speed.

Jim

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

On Oct 28, 2010, at 11:52 AM, Michael D wrote:

Mike, I'm not sure what you mean about removing foo but I think the  
method
is sound in diagnosing a program issue and the results speak for  
themselves.

I did invert my if statement at the suggestion of a CS professor  
(who also
suggested recoding in C, but I'm in an applied math program and  
haven't had
the time to take programming courses, which i know would be helpful)

Anyway, with the statement as:

if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}

run times were back to around 20 minutes.
Have you tried replacing all of those 10^x operations with their  
integer equivalents, c(10000L, 100000L, 1000000L)? Each time through  
the loop you are unnecessarily calling the "^" function 4 times. You  
could also omit the last one. 10^7,  during testing since M at the  
last iteration (k=10^7) would be the final value and you could just  
assign the state of M at the end. So we have eliminated 4*10^7  
unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS  
professor is perhaps used to having the C compiler do all thinking  
of this sort for him.)
Bill Dunlap's suggestion to use "==" instead of %in% cut the time to  
1/3 of what it had been even after the pre-calculation of the integer  
values( which only improved the looping times by 30%). The combination  
of the two with:
  if (k ==10000L|k==100000L|k==1000000L ) { ... }

... resulted in an improvement by a factor or 12.006/2.523 or 475% for  
the interim checking and printing operation using Bill's test suite.
-- 
David

So as best I can tell something
happens in the if statement causing the computer to work ahead, as  
the
professor suggests. I'm no expert on R (and have no desire to try  
looking at
the R source code (it would only confuse me)) but if anyone can offer
guidance on how the if statement works (Does R try to work ahead?  
Under what
conditions does it try to "work ahead" so I can try to exploit this
behavior) I would greatly appreciate it.
If it would require too much knowledge of the computer system to  
understand
I doubt I would be able to make use of it, but maybe someone else  
could
benefit.

On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka <marchywka at hotmail.com 
wrote:

----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400
From: mike409 at gmail.com
To: jim at bitwrit.com.au
CC: r-help at r-project.org
Subject: Re: [R] runtime on ising model

I have an update on where the issue is coming from.

I commented out the code for "pos[k+1] <- M[i,j]" and the if  
statement
for
time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
fast(er).
Next I added back in the "pos" statements and still runtimes were  
good
(around 20 minutes).

So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics
decades ago, something about 8-tracks and cassettes, but you have
to be careful with conclusions like " I removed foo and problem
went away therefore problem was foo." Performance issues are often
caused by memory, not CPU limitations. Removing anything with a big
memory footprint could speed things up. IO can be a real bottleneck.
If you are talking about things on minute timescales, look at task
manager and see if you are even CPU limited. Look for page faults
or IO etc. If you really need performance and have a task which
is relatively simple, don't ignore c++ as a way to generate data
points and then import these into R for analysis.

In short, just because you are focusing on math it doesn't mean
the computer is limited by that.

## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}

Would there be any reason R is executing the statements inside  
the "if"
before getting to the logical check?
Maybe R is written to hope for the best outcome (TRUE) and will  
just
throw
out its work if the logic comes up FALSE?
I guess I can always break the for loop up into four parts and  
store the
state at the end of each, but thats an unsatisfying solution to me.

Jim, I like the suggestion of just pulling one big sample, but  
since I
can
get the runtimes under 30 minutes just by removing the storage  
piece I
doubt
I would see any noticeable changes by pulling large sample vectors.

Thanks,
Michael

On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon  wrote:

On 10/26/2010 04:50 PM, Michael D wrote:

So I'm in a stochastic simulations class and I having issues  
with the
amount
of time it takes to run the Ising model.

I usually don't like to attach the code I'm running, since it  
will
probably
make me look like a fool, but I figure its the best way I can  
find any
bits
I can speed up run time.

As for the goals of the exercise:
I need the state of the system at time=1, 10k, 100k, 1mill, and  
10mill
and the percentage of vertices with positive spin at all t

Just to be clear, i'm not expecting anyone to tell me how to  
program
this
model, cause I know what I have works for this exercise, but it  
takes
far
too long to run and I'd like to speed it up by replacing slow
operations
wherever possible.

Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too  
much
memory, setting up a vector of the samples (maybe a million at a  
time
if 10
million is too big - might be able to rewrite your sample vector  
when
you
store the state) and using k (and an offset if you don't have  
one big
vector) to index it will give you some speed.

Jim

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
----------------------------------------
Date: Thu, 28 Oct 2010 09:58:40 -0700
From: wdunlap at tibco.com
To: dwinsemius at comcast.net; mike409 at gmail.com
CC: r-help at r-project.org
Subject: Re: [R] runtime on ising model

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
Sent: Thursday, October 28, 2010 9:20 AM
To: Michael D
Cc: r-help at r-project.org
Subject: Re: [R] runtime on ising model

On Oct 28, 2010, at 11:52 AM, Michael D wrote:

Mike, I'm not sure what you mean about removing foo but I
think the
method
is sound in diagnosing a program issue and the results speak for
themselves.
Agreed on first part but not second- empirical debugging rarely 
produces compelling results in isolation. As a collection
of symptons fine but not conclusive- if you learn c++ you will
find out about all kinds of things like memory corruption that
never make sense :) Here, the big concern is issues with memory
as you never determined to be CPU limited although based on
others comments you likely are in any case.
I did invert my if statement at the suggestion of a CS professor
(who also
suggested recoding in C, but I'm in an applied math program and
haven't had
the time to take programming courses, which i know would be helpful)

Anyway, with the statement as:

if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
#do nothing
} else {
q <- q+1
Out[[q]] <- M
}

run times were back to around 20 minutes.
Did that one change really make a difference?
R does not evaluate anything in the if or else
clauses of an if statement before evaluating
the condition.
What is at issue here? That is, the OP claimed inverting polarity
sped things up, suggesting that the branch mattered. AFAIK he
never actually proved which branch was taken. This could
imply many things or nothing: one branch may be slow, or cause
a page fault, or the test may fail fast but succed slowly( testing
huge array for equality for example) .

Have you tried replacing all of those 10^x operations with their
integer equivalents, c(10000L, 100000L, 1000000L)? Each time through
the loop you are unnecessarily calling the "^" function 4 times. You
could also omit the last one. 10^7, during testing since M at the
last iteration (k=10^7) would be the final value and you could just
assign the state of M at the end. So we have eliminated 4*10^7
unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS
professor is perhaps used to having the C compiler do all
thinking of
this sort for him.)
%in% is a relatively expensive function. Use == if you
can. E.g., compare the following 2 ways of stashing
something at times 1e4, 1e5, and 1e6:

system.time({z <- integer()
for(k in seq_len(1e6))
if(k %in% set) z[length(z)+1]<-k
print(z)})
[1] 10000 100000 1000000
user system elapsed
46.790 0.023 46.844

system.time({z <- integer()
nextCheckPoint <- 10^4
for(k in seq_len(1e6))
if( k == nextCheckPoint ) {
nextCheckPoint <- nextCheckPoint * 10
z[length(z)+1]<-k
}
print(z)})
[1] 10000 100000 1000000
user system elapsed
4.529 0.013 4.545

With such a large number of iterations it pays to
remove unneeded function calls in arithmetic expressions.
R does not optimize them out - it is up to you to
do that. E.g.,

system.time(for(i in seq_len(1e6)) sign(pi)*(-1))
user system elapsed
6.802 0.014 6.818
system.time(for(i in seq_len(1e6)) -sign(pi))
user system elapsed
3.896 0.011 3.911

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

--
David

So as best I can tell something
happens in the if statement causing the computer to work
ahead, as the
professor suggests. I'm no expert on R (and have no desire to try
looking at
the R source code (it would only confuse me)) but if anyone
can offer
guidance on how the if statement works (Does R try to work ahead?
Under what
conditions does it try to "work ahead" so I can try to exploit this
behavior) I would greatly appreciate it.
If it would require too much knowledge of the computer system to
understand
I doubt I would be able to make use of it, but maybe someone else
could
benefit.

On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka
wrote:

----------------------------------------
Date: Tue, 26 Oct 2010 12:53:14 -0400
From: mike409 at gmail.com
To: jim at bitwrit.com.au
CC: r-help at r-project.org
Subject: Re: [R] runtime on ising model

I have an update on where the issue is coming from.

I commented out the code for "pos[k+1] <- M[i,j]" and the if
statement
for
time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran
fast(er).
Next I added back in the "pos" statements and still
runtimes were
good
(around 20 minutes).

So I'm left with something is causing problems in:
I haven't looked at this since some passing interest in magnetics
decades ago, something about 8-tracks and cassettes, but you have
to be careful with conclusions like " I removed foo and problem
went away therefore problem was foo." Performance issues are often
caused by memory, not CPU limitations. Removing anything with a big
memory footprint could speed things up. IO can be a real
bottleneck.
If you are talking about things on minute timescales, look at task
manager and see if you are even CPU limited. Look for page faults
or IO etc. If you really need performance and have a task which
is relatively simple, don't ignore c++ as a way to generate data
points and then import these into R for analysis.

In short, just because you are focusing on math it doesn't mean
the computer is limited by that.

## Store state at time 10^4, 10^5, 10^6, 10^7
if( k %in% c(10^4,10^5,10^6,10^7) ){
q <- q+1
Out[[q]] <- M
}

Would there be any reason R is executing the statements
inside the
"if"
before getting to the logical check?
Maybe R is written to hope for the best outcome (TRUE)
and will just
throw
out its work if the logic comes up FALSE?
I guess I can always break the for loop up into four parts and
store the
state at the end of each, but thats an unsatisfying
solution to me.

Jim, I like the suggestion of just pulling one big sample, but
since I
can
get the runtimes under 30 minutes just by removing the storage
piece I
doubt
I would see any noticeable changes by pulling large
sample vectors.
Thanks,
Michael

On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote:

On 10/26/2010 04:50 PM, Michael D wrote:

So I'm in a stochastic simulations class and I having issues
with the
amount
of time it takes to run the Ising model.

I usually don't like to attach the code I'm running,
since it will
probably
make me look like a fool, but I figure its the best way I can
find any
bits
I can speed up run time.

As for the goals of the exercise:
I need the state of the system at time=1, 10k, 100k,
1mill, and
10mill
and the percentage of vertices with positive spin at all t

Just to be clear, i'm not expecting anyone to tell me how to
program
this
model, cause I know what I have works for this
exercise, but it
takes
far
too long to run and I'd like to speed it up by replacing slow
operations
wherever possible.

Hi Michael,
One bottleneck is probably the sampling. If it doesn't grab too
much
memory, setting up a vector of the samples (maybe a
million at a
time
if 10
million is too big - might be able to rewrite your
sample vector
when
you
store the state) and using k (and an offset if you don't
have one
big
vector) to index it will give you some speed.

Jim

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.