Hello R experts,
I'm new to R and I'm wanting to know what is the best way to speed up my code. I've read that you can vectorize the code but I'm unsure on how to implement this into my code.
df <- data.frame(31790,31790)
for (i in 1:31790)
{
? for (j in i:31790)
? {
??? ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall", use="pairwise")
??? dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
?? ?
??? df[i,j]<-ifelse(dis2<=500,ken,NA)
??? }
? }
df
Thanks!
Speeding up code
3 messages · Amie Hunter, Jeff Newmiller, MacQueen, Don
What is cldm?
We (and therefore you, to verify that we can) should be able to copy the example from the email and paste it into a newly-started instance of R. Not having some example data similar to yours to work with puts us at a major disadvantage. It would also be helpful to know what you are trying to accomplish (description).
You might want to use the str function to understand what each object you are creating really is. I don't know what you want the "df" object to be, but a data frame of two values in default-named columns is unusual. You may be confusing matrices with data frames?
(Note that there is a function called df in the core libraries, so you might want to avoid using that name to avoid confusion.)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Amie Hunter <amie_hunter at hotmail.com> wrote:
Hello R experts,
I'm new to R and I'm wanting to know what is the best way to speed up
my code. I've read that you can vectorize the code but I'm unsure on
how to implement this into my code.
df <- data.frame(31790,31790)
for (i in 1:31790)
{
? for (j in i:31790)
? {
??? ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall",
use="pairwise")
??? dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
?? ?
??? df[i,j]<-ifelse(dis2<=500,ken,NA)
??? }
? }
df
Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
ditto to everything Jeff Newmiller said, but I'll take it a little further. I'm guessing that with df <- data.frame(31790,31790) you thought you were creating something with 31790 rows and 31790 columns. You weren't. You were creating a data frame with one row and two columns:
data.frame(31790,31790)
X31790 X31790.1 1 31790 31790 Given that in your loop you assign values to df[i,j], and having started with just one row and two columns, it follows that every time you assign to df[i,j] you are increasing the size of your data frame, and that will slow things down. Initialize with a matrix (I'll call it 'res' instead of 'df'): res <- matrix(NA, 31790,31790) Then inside your loop, you can use if (dis2<=500) res[i,j] <- ken No need to deal with 'else', since the matrix is initialized with NA. The ifelse() function was a less than ideal choice, since it is designed for vector arguments, and your value, dis2, appears to always have length = 1. You could have used df[i,j] <- if (dis2 <= 500) ken else NA but as I mentioned above, if you initialize to NA there's no need handle the 'else' case inside the loop. It may be possible to vectorize your loop, but I kind of doubt it, considering that you're using the cor() followed by the deg.dist() function at every iteration. However, you could calculate the dis2 value first, and then calculate ken only when dis2 is <= 500. You're calculating ken even when it's not needed. Avoiding that should speed things up. I don't know what deg.dist() is doing, but if it is calculating distances between points, there are functions for doing that on whole bunches of points at once. Perhaps your data could be rearranged to work with one of those. -Don
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 11/23/13 1:39 PM, "Amie Hunter" <amie_hunter at hotmail.com> wrote:
>Hello R experts,
>
>I'm new to R and I'm wanting to know what is the best way to speed up my
>code. I've read that you can vectorize the code but I'm unsure on how to
>implement this into my code.
>
>
>df <- data.frame(31790,31790)
>
>for (i in 1:31790)
>{
> for (j in i:31790)
> {
> ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall", use="pairwise")
> dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
>
> df[i,j]<-ifelse(dis2<=500,ken,NA)
> }
> }
>df
>
>Thanks!
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.