Skip to content

Speeding up code

3 messages · Amie Hunter, Jeff Newmiller, MacQueen, Don

#
Hello R experts, 

I'm new to R and I'm wanting to know what is the best way to speed up my code. I've read that you can vectorize the code but I'm unsure on how to implement this into my code.


df <- data.frame(31790,31790)

for (i in 1:31790) 
{
? for (j in i:31790) 
? {
??? ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall", use="pairwise")
??? dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
?? ?
??? df[i,j]<-ifelse(dis2<=500,ken,NA)
??? }
? } 
df

Thanks!
#
What is cldm?

We (and therefore you, to verify that we can) should be able to copy the example from the email and paste it into a newly-started instance of R. Not having some example data similar to yours to work with puts us at a major disadvantage. It would also be helpful to know what you are trying to accomplish (description).

You might want to use the str function to understand what each object you are creating really is. I don't know what you want the "df" object to be, but a data frame of two values in default-named columns is unusual. You may be confusing matrices with data frames?

(Note that there is a function called df in the core libraries, so you might want to avoid using that name to avoid confusion.)
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Amie Hunter <amie_hunter at hotmail.com> wrote:
1 day later
#
ditto to everything Jeff Newmiller said, but I'll take it a little further.

I'm guessing that with
   df <- data.frame(31790,31790)
you thought you were creating something with 31790 rows and 31790 columns.
You weren't. You were creating a data frame with one row and two columns:
X31790 X31790.1
1  31790    31790

Given that in your loop you assign values to df[i,j],
and having started with just one row and two columns, it follows
that every time you assign to df[i,j] you are increasing
the size of your data frame, and that will slow things down.

Initialize with a matrix (I'll call it 'res' instead of 'df'):

  res <- matrix(NA, 31790,31790)

Then inside your loop, you can use
  

   if (dis2<=500) res[i,j] <- ken

No need to deal with 'else', since the matrix is initialized
with NA.

The ifelse() function was a less than ideal choice,
since it is designed for vector arguments, and your value, dis2,
appears to always have length = 1. You could have used
  df[i,j] <- if (dis2 <= 500) ken else NA
but as I mentioned above, if you initialize to NA there's no need
handle the 'else' case inside the loop.

It may be possible to vectorize your loop, but I kind of doubt it,
considering that you're using the cor() followed by the deg.dist()
function at every iteration.

However, you could calculate the dis2 value first, and then calculate
ken only when dis2 is <= 500. You're calculating ken even when it's not
needed. Avoiding that should speed things up.

I don't know what deg.dist() is doing, but if it is calculating distances
between points, there are functions for doing that on whole bunches
of points at once. Perhaps your data could be rearranged to work
with one of those.

-Don