Dear RHelp-list,
?? I try to use the package comprehenr to replace a for loop by a list
comprehension.
?I wrote the code but I certainly miss something because it is very
slower compared to the for loops. May you please explain to me why the
list comprehension is slower in my case.
Here is my example. I do the calculation of the square difference
between the values of two vectors vec1 and vec2, the ratio sampling
between vec1 and vec2 is equal to ratio_sampling. I have to use only the
500th value of the first serie before doing the difference with the
value of the second serie (vec2).
Thank you
Best regards
Laurent
library(tictoc)
library(comprehenr)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
## 1. with the "for" loops
## the square differences will be stored in a vector
S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
tic()
for( j in 1:length(S_diff2)){
? sum_squares <- 0
? for( i in 1:length(vec2)){
??? sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
vec2[i])**2)
? }
? S_diff2[j] <- sum_squares
}
toc()
## 0.22 sec elapsed
which.max(S_diff2)
## 7857
## 2. with the lists comprehension
tic()
S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
toc()
## 25.09 sec elapsed
which.max(S_diff2)
## 7857
slowness when I use a list comprehension
7 messages · Jeff Newmiller, @vi@e@gross m@iii@g oii gm@ii@com, Gabor Grothendieck +1 more
Laurent,
Thank you for introducing me to a package I did not know existed as I use features like list comprehension in python all the time and could see using it in R now that I know it is available.
As to why you see your example as slow, I see you used a fairly complex and nested expression and wonder if it was a better way to go. As you are dealing with an interpreter doing delayed evaluation, I can imagine reasons it can be slow. But note the package comprehenr may not be designed to be more efficient than loops or of the more built-in functional methods that can be faster. The package is there perhaps more as a compatibility helper that allows you to write closer to the python style and perhaps re-shapes what you wrote into a set of instructions in more native R.
Just for comparison, in python, things like comprehensions for list or dictionaries or tuples often are syntactic sugar and the interpreter may simply rewrite them more like the first program you typed and evaluates that. The comprehensions are more designed for users who can think another way and write things more compactly as one-liners. Depending on implementations, they may be faster or slower on some examples.
I am not saying there is nothing else that is slowing it down for you. I am suggesting that using the feature as currently implemented may not be an advantage except in your thought process. It may be it could be improved, such as by replacing more functionality out of R and into compiled languages as has been done for many packages.
Avi
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Laurent Rhelp
Sent: Sunday, June 16, 2024 11:28 AM
To: r-help at r-project.org
Subject: [R] slowness when I use a list comprehension
Dear RHelp-list,
I try to use the package comprehenr to replace a for loop by a list
comprehension.
I wrote the code but I certainly miss something because it is very
slower compared to the for loops. May you please explain to me why the
list comprehension is slower in my case.
Here is my example. I do the calculation of the square difference
between the values of two vectors vec1 and vec2, the ratio sampling
between vec1 and vec2 is equal to ratio_sampling. I have to use only the
500th value of the first serie before doing the difference with the
value of the second serie (vec2).
Thank you
Best regards
Laurent
library(tictoc)
library(comprehenr)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
## 1. with the "for" loops
## the square differences will be stored in a vector
S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
tic()
for( j in 1:length(S_diff2)){
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
vec2[i])**2)
}
S_diff2[j] <- sum_squares
}
toc()
## 0.22 sec elapsed
which.max(S_diff2)
## 7857
## 2. with the lists comprehension
tic()
S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
toc()
## 25.09 sec elapsed
which.max(S_diff2)
## 7857
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
I would be more strong on this advice: learn to think in R, rather than thinking in Python, when programming in R. R has atomic vectors... Python does not (until you import a package that implements them). I find that while it is possible to import R thinking into Python, Python programmers seem to object for stylistic reasons even though such thinking speeds up Python also.
A key step in that direction is to stop using lists directly for numeric calculations... use them to manage numeric vactors. In some cases you can switch to matrices or arrays to remove even more list manipulations from the script.
library(microbenchmark)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
dloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
S_diff2 <- numeric(
N1-(N2-1)*ratio_sampling
)
for( j in 1:length(S_diff2) ) {
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares <- (
sum_squares
+ (
vec1[ (i-1)*ratio_sampling+j ]
- vec2[i]
)**2
)
}
S_diff2[j] <- sum_squares
}
S_diff2
}
vloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
S_diff3 <- numeric(
N1-(N2-1)*ratio_sampling
)
i <- seq_along( vec2 )
k <- (i-1)*ratio_sampling
for( j in seq_along( S_diff3 ) ) {
S_diff3[j] <- sum(
(
vec1[ j + k ]
- vec2
)^2
)
}
S_diff3
}
microbenchmark(
S_diff2 <- dloop( N1, M2, ratio_sampling, vec1, vec2 )
, S_diff3 <- vloop( N1, M2, ratio_sampling, vec1, vec2 )
, times = 20
)
all.equal( S_diff2, S_diff3 )
On June 16, 2024 9:33:54 AM PDT, avi.e.gross at gmail.com wrote:
Laurent,
Thank you for introducing me to a package I did not know existed as I use features like list comprehension in python all the time and could see using it in R now that I know it is available.
As to why you see your example as slow, I see you used a fairly complex and nested expression and wonder if it was a better way to go. As you are dealing with an interpreter doing delayed evaluation, I can imagine reasons it can be slow. But note the package comprehenr may not be designed to be more efficient than loops or of the more built-in functional methods that can be faster. The package is there perhaps more as a compatibility helper that allows you to write closer to the python style and perhaps re-shapes what you wrote into a set of instructions in more native R.
Just for comparison, in python, things like comprehensions for list or dictionaries or tuples often are syntactic sugar and the interpreter may simply rewrite them more like the first program you typed and evaluates that. The comprehensions are more designed for users who can think another way and write things more compactly as one-liners. Depending on implementations, they may be faster or slower on some examples.
I am not saying there is nothing else that is slowing it down for you. I am suggesting that using the feature as currently implemented may not be an advantage except in your thought process. It may be it could be improved, such as by replacing more functionality out of R and into compiled languages as has been done for many packages.
Avi
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Laurent Rhelp
Sent: Sunday, June 16, 2024 11:28 AM
To: r-help at r-project.org
Subject: [R] slowness when I use a list comprehension
Dear RHelp-list,
I try to use the package comprehenr to replace a for loop by a list
comprehension.
I wrote the code but I certainly miss something because it is very
slower compared to the for loops. May you please explain to me why the
list comprehension is slower in my case.
Here is my example. I do the calculation of the square difference
between the values of two vectors vec1 and vec2, the ratio sampling
between vec1 and vec2 is equal to ratio_sampling. I have to use only the
500th value of the first serie before doing the difference with the
value of the second serie (vec2).
Thank you
Best regards
Laurent
library(tictoc)
library(comprehenr)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
## 1. with the "for" loops
## the square differences will be stored in a vector
S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
tic()
for( j in 1:length(S_diff2)){
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
vec2[i])**2)
}
S_diff2[j] <- sum_squares
}
toc()
## 0.22 sec elapsed
which.max(S_diff2)
## 7857
## 2. with the lists comprehension
tic()
S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
toc()
## 25.09 sec elapsed
which.max(S_diff2)
## 7857
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
I fully agree with Jeff that the best way to use ANY language is to evaluate the language in terms of not just the capabilities it offers but also the philosophy behind what it was created for and how people do things and just grok it and use it mostly in the way intended. I do that with all the languages I learn, whether for computers or humans. Bringing in something you like from another language often gets in the way of actually using what you have. But realistically, many languages that were designed for one purpose will then evolve to suit many other purposes and lose their direction and often their focus and even efficiency. S was designed for statistical computing of some sorts and that meant a vectorized approach could take you far. Python had other design goals and the original designers wanted elements of genrality that a list provides more than a vector does. R has lists too, but note if you want to use the kind of dictionary or set used in python, which definitely can have advanatages and disadvantages, you can find add-ons in R packages that give you something like that too. And, note, many, myself included, really appreciate alternate ways to do things and heavily use tidyverse packages that mostly are not base R but sort of a grafted-on other language. So what? Purists don't necessarily do well in the real world. On the topic at hand and speed, I went an looked at the comprehenr package and it is no wonder it is slower. Here is the code Laurent used in calling to_vec:
to_vec
function (expr, recursive = TRUE, use.names = FALSE)
{
res = eval.parent(substitute(comprehenr::to_list(expr)))
unlist(res, recursive = recursive, use.names = use.names)
}'
It does a few things and then calls to_list() to do the actual work. This
extra layer may slow it down a tad.
So what does to_list() do?
to_list
function (expr)
{
expr = substitute(expr)
is_loop(expr) || stop(paste("argument should be expression with 'for',
'while' or 'repeat' but we have: ",
deparse(expr, width.cutoff = 500)[1]))
expr = expand_loop_variables(expr)
expr = add_assignment_to_final_loops(expr)
expr = substitute(local({
.___res <- list()
.___counter <- 0
expr
.___res
}))
eval.parent(expr)
}
I won't follow the entire chain, but it seems to take the code supplied and
isolate various parts needed and, in effect, build up some other code and
evaluates it in the context of the parent.
Obviously, had you written similar (or different using loops or whatever)
code directly, it might execute faster.
As I mentioned, this is largely syntactic sugar. A reasonable use of this is
if you are given python code and asked to translate it into R code that does
the same thing. You could spend time thinking and designing and come up with
the kind of R code an R expert might have done, or skip that and just make
slight changes needed for R and for the package being used and it should
work, but not necessarily the way a native polished version works. Later, if
time and finances permit, and you want it faster, rewrite it.
I note the package, with a vignetter here:
https://cran.r-project.org/web//packages/comprehenr/vignettes/Introduction.h
tml
Does make some changes so translating is not trivial. For example, the
python syntax such as:
[ f(x) for x in iterable if condition]
Is not able to be used in quite that order. It loosely translates to:
to_vec(for x in iterable if condition f(x))
with the result at the end rather than beginning. And, since R has not
chosen to return multiple things from a function like python does and just
unpack them, they had to come up with interesting workarounds like `x, y`
and frankly, quite a few things I can do in python in this context are
simply not supported by this code, nor can be expected to.
I think if someone using python was used to using the extended version by
loading modules like numpy and pandas and using them heavily, they might
find it a tad easier to then port the code to R and use vectorized
functionality better.
So, are packages like comprehend a crutch or are they helpful or even evil?
My view is to not be a religious fanatic and assume any language was really
designed perfectly. Some ideas and implementations can be a useful way to
formulate a problem for a programmer who thinks in that way, at least until
they learn to also think in another. An example would be the R way to do
sets is probably not as useful as the python way. If I needed heavy duty
usage, I might load a package that lets me think about it the way I want,
and the same for a dictionary.
But, if I am writing code for others to maintain and change later, the
closer I stick to the main language or accepted packages, the better.
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller via
R-help
Sent: Sunday, June 16, 2024 1:13 PM
To: r-help at r-project.org
Subject: Re: [R] slowness when I use a list comprehension
I would be more strong on this advice: learn to think in R, rather than
thinking in Python, when programming in R. R has atomic vectors... Python
does not (until you import a package that implements them). I find that
while it is possible to import R thinking into Python, Python programmers
seem to object for stylistic reasons even though such thinking speeds up
Python also.
A key step in that direction is to stop using lists directly for numeric
calculations... use them to manage numeric vactors. In some cases you can
switch to matrices or arrays to remove even more list manipulations from the
script.
library(microbenchmark)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
dloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
S_diff2 <- numeric(
N1-(N2-1)*ratio_sampling
)
for( j in 1:length(S_diff2) ) {
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares <- (
sum_squares
+ (
vec1[ (i-1)*ratio_sampling+j ]
- vec2[i]
)**2
)
}
S_diff2[j] <- sum_squares
}
S_diff2
}
vloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
S_diff3 <- numeric(
N1-(N2-1)*ratio_sampling
)
i <- seq_along( vec2 )
k <- (i-1)*ratio_sampling
for( j in seq_along( S_diff3 ) ) {
S_diff3[j] <- sum(
(
vec1[ j + k ]
- vec2
)^2
)
}
S_diff3
}
microbenchmark(
S_diff2 <- dloop( N1, M2, ratio_sampling, vec1, vec2 )
, S_diff3 <- vloop( N1, M2, ratio_sampling, vec1, vec2 )
, times = 20
)
all.equal( S_diff2, S_diff3 )
On June 16, 2024 9:33:54 AM PDT, avi.e.gross at gmail.com wrote:
Laurent, Thank you for introducing me to a package I did not know existed as I use
features like list comprehension in python all the time and could see using it in R now that I know it is available.
As to why you see your example as slow, I see you used a fairly complex and
nested expression and wonder if it was a better way to go. As you are dealing with an interpreter doing delayed evaluation, I can imagine reasons it can be slow. But note the package comprehenr may not be designed to be more efficient than loops or of the more built-in functional methods that can be faster. The package is there perhaps more as a compatibility helper that allows you to write closer to the python style and perhaps re-shapes what you wrote into a set of instructions in more native R.
Just for comparison, in python, things like comprehensions for list or
dictionaries or tuples often are syntactic sugar and the interpreter may simply rewrite them more like the first program you typed and evaluates that. The comprehensions are more designed for users who can think another way and write things more compactly as one-liners. Depending on implementations, they may be faster or slower on some examples.
I am not saying there is nothing else that is slowing it down for you. I am
suggesting that using the feature as currently implemented may not be an advantage except in your thought process. It may be it could be improved, such as by replacing more functionality out of R and into compiled languages as has been done for many packages.
Avi
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Laurent Rhelp
Sent: Sunday, June 16, 2024 11:28 AM
To: r-help at r-project.org
Subject: [R] slowness when I use a list comprehension
Dear RHelp-list,
I try to use the package comprehenr to replace a for loop by a list
comprehension.
I wrote the code but I certainly miss something because it is very
slower compared to the for loops. May you please explain to me why the
list comprehension is slower in my case.
Here is my example. I do the calculation of the square difference
between the values of two vectors vec1 and vec2, the ratio sampling
between vec1 and vec2 is equal to ratio_sampling. I have to use only the
500th value of the first serie before doing the difference with the
value of the second serie (vec2).
Thank you
Best regards
Laurent
library(tictoc)
library(comprehenr)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
## 1. with the "for" loops
## the square differences will be stored in a vector
S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
tic()
for( j in 1:length(S_diff2)){
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
vec2[i])**2)
}
S_diff2[j] <- sum_squares
}
toc()
## 0.22 sec elapsed
which.max(S_diff2)
## 7857
## 2. with the lists comprehension
tic()
S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
toc()
## 25.09 sec elapsed
which.max(S_diff2)
## 7857
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Avi and Jeff, Thank you very much for your answers. I did not think I would get such an interessing answer when I asked my question. In fact, I discovered recently the list comprehension reading some python code and I was seduced but the compact notation so I decided to do an exercice on an example. Now I know why the use of the comprehenr use is slow (cf. avi answer) and I was impressed by the jeff?s function which uses the vertorization. Unit: milliseconds expr min lq mean median S_diff2 <- dloop(N1, M2, ratio_sampling, vec1, vec2) 205.0905 212.86080 226.80683 221.3820 S_diff3 <- vloop(N1, M2, ratio_sampling, vec1, vec2) 49.8971 57.05555 64.25502 58.9455 uq max neval cld 227.57695 297.9974 20 a 63.15645 113.4106 20 b I did not have the idea to transform the second loop with a vectorize approach. Hence, the good direction is to think more in terms of vectorization. I will search for some exercices on the web. Le 16/06/2024 ? 19:44, avi.e.gross at gmail.com a ?crit?:
I fully agree with Jeff that the best way to use ANY language is to evaluate the language in terms of not just the capabilities it offers but also the philosophy behind what it was created for and how people do things and just grok it and use it mostly in the way intended. I do that with all the languages I learn, whether for computers or humans. Bringing in something you like from another language often gets in the way of actually using what you have. But realistically, many languages that were designed for one purpose will then evolve to suit many other purposes and lose their direction and often their focus and even efficiency. S was designed for statistical computing of some sorts and that meant a vectorized approach could take you far. Python had other design goals and the original designers wanted elements of genrality that a list provides more than a vector does. R has lists too, but note if you want to use the kind of dictionary or set used in python, which definitely can have advanatages and disadvantages, you can find add-ons in R packages that give you something like that too. And, note, many, myself included, really appreciate alternate ways to do things and heavily use tidyverse packages that mostly are not base R but sort of a grafted-on other language. So what? Purists don't necessarily do well in the real world. On the topic at hand and speed, I went an looked at the comprehenr package and it is no wonder it is slower. Here is the code Laurent used in calling to_vec:
to_vec
function (expr, recursive = TRUE, use.names = FALSE)
{
res = eval.parent(substitute(comprehenr::to_list(expr)))
unlist(res, recursive = recursive, use.names = use.names)
}'
It does a few things and then calls to_list() to do the actual work. This
extra layer may slow it down a tad.
So what does to_list() do?
to_list
function (expr)
{
expr = substitute(expr)
is_loop(expr) || stop(paste("argument should be expression with 'for',
'while' or 'repeat' but we have: ",
deparse(expr, width.cutoff = 500)[1]))
expr = expand_loop_variables(expr)
expr = add_assignment_to_final_loops(expr)
expr = substitute(local({
.___res <- list()
.___counter <- 0
expr
.___res
}))
eval.parent(expr)
}
I won't follow the entire chain, but it seems to take the code supplied and
isolate various parts needed and, in effect, build up some other code and
evaluates it in the context of the parent.
Obviously, had you written similar (or different using loops or whatever)
code directly, it might execute faster.
As I mentioned, this is largely syntactic sugar. A reasonable use of this is
if you are given python code and asked to translate it into R code that does
the same thing. You could spend time thinking and designing and come up with
the kind of R code an R expert might have done, or skip that and just make
slight changes needed for R and for the package being used and it should
work, but not necessarily the way a native polished version works. Later, if
time and finances permit, and you want it faster, rewrite it.
I note the package, with a vignetter here:
https://cran.r-project.org/web//packages/comprehenr/vignettes/Introduction.h
tml
Does make some changes so translating is not trivial. For example, the
python syntax such as:
[ f(x) for x in iterable if condition]
Is not able to be used in quite that order. It loosely translates to:
to_vec(for x in iterable if condition f(x))
with the result at the end rather than beginning. And, since R has not
chosen to return multiple things from a function like python does and just
unpack them, they had to come up with interesting workarounds like `x, y`
and frankly, quite a few things I can do in python in this context are
simply not supported by this code, nor can be expected to.
I think if someone using python was used to using the extended version by
loading modules like numpy and pandas and using them heavily, they might
find it a tad easier to then port the code to R and use vectorized
functionality better.
So, are packages like comprehend a crutch or are they helpful or even evil?
My view is to not be a religious fanatic and assume any language was really
designed perfectly. Some ideas and implementations can be a useful way to
formulate a problem for a programmer who thinks in that way, at least until
they learn to also think in another. An example would be the R way to do
sets is probably not as useful as the python way. If I needed heavy duty
usage, I might load a package that lets me think about it the way I want,
and the same for a dictionary.
But, if I am writing code for others to maintain and change later, the
closer I stick to the main language or accepted packages, the better.
-----Original Message-----
From: R-help<r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller via
R-help
Sent: Sunday, June 16, 2024 1:13 PM
To:r-help at r-project.org
Subject: Re: [R] slowness when I use a list comprehension
I would be more strong on this advice: learn to think in R, rather than
thinking in Python, when programming in R. R has atomic vectors... Python
does not (until you import a package that implements them). I find that
while it is possible to import R thinking into Python, Python programmers
seem to object for stylistic reasons even though such thinking speeds up
Python also.
A key step in that direction is to stop using lists directly for numeric
calculations... use them to manage numeric vactors. In some cases you can
switch to matrices or arrays to remove even more list manipulations from the
script.
library(microbenchmark)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
dloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
S_diff2 <- numeric(
N1-(N2-1)*ratio_sampling
)
for( j in 1:length(S_diff2) ) {
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares <- (
sum_squares
+ (
vec1[ (i-1)*ratio_sampling+j ]
- vec2[i]
)**2
)
}
S_diff2[j] <- sum_squares
}
S_diff2
}
vloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
S_diff3 <- numeric(
N1-(N2-1)*ratio_sampling
)
i <- seq_along( vec2 )
k <- (i-1)*ratio_sampling
for( j in seq_along( S_diff3 ) ) {
S_diff3[j] <- sum(
(
vec1[ j + k ]
- vec2
)^2
)
}
S_diff3
}
microbenchmark(
S_diff2 <- dloop( N1, M2, ratio_sampling, vec1, vec2 )
, S_diff3 <- vloop( N1, M2, ratio_sampling, vec1, vec2 )
, times = 20
)
all.equal( S_diff2, S_diff3 )
On June 16, 2024 9:33:54 AM PDT,avi.e.gross at gmail.com wrote:
Laurent, Thank you for introducing me to a package I did not know existed as I use
features like list comprehension in python all the time and could see using it in R now that I know it is available.
As to why you see your example as slow, I see you used a fairly complex and
nested expression and wonder if it was a better way to go. As you are dealing with an interpreter doing delayed evaluation, I can imagine reasons it can be slow. But note the package comprehenr may not be designed to be more efficient than loops or of the more built-in functional methods that can be faster. The package is there perhaps more as a compatibility helper that allows you to write closer to the python style and perhaps re-shapes what you wrote into a set of instructions in more native R.
Just for comparison, in python, things like comprehensions for list or
dictionaries or tuples often are syntactic sugar and the interpreter may simply rewrite them more like the first program you typed and evaluates that. The comprehensions are more designed for users who can think another way and write things more compactly as one-liners. Depending on implementations, they may be faster or slower on some examples.
I am not saying there is nothing else that is slowing it down for you. I am
suggesting that using the feature as currently implemented may not be an advantage except in your thought process. It may be it could be improved, such as by replacing more functionality out of R and into compiled languages as has been done for many packages.
Avi
-----Original Message-----
From: R-help<r-help-bounces at r-project.org> On Behalf Of Laurent Rhelp
Sent: Sunday, June 16, 2024 11:28 AM
To:r-help at r-project.org
Subject: [R] slowness when I use a list comprehension
Dear RHelp-list,
I try to use the package comprehenr to replace a for loop by a list
comprehension.
I wrote the code but I certainly miss something because it is very
slower compared to the for loops. May you please explain to me why the
list comprehension is slower in my case.
Here is my example. I do the calculation of the square difference
between the values of two vectors vec1 and vec2, the ratio sampling
between vec1 and vec2 is equal to ratio_sampling. I have to use only the
500th value of the first serie before doing the difference with the
value of the second serie (vec2).
Thank you
Best regards
Laurent
library(tictoc)
library(comprehenr)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
## 1. with the "for" loops
## the square differences will be stored in a vector
S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
tic()
for( j in 1:length(S_diff2)){
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
vec2[i])**2)
}
S_diff2[j] <- sum_squares
}
toc()
## 0.22 sec elapsed
which.max(S_diff2)
## 7857
## 2. with the lists comprehension
tic()
S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
toc()
## 25.09 sec elapsed
which.max(S_diff2)
## 7857
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
This can be vectorized. Try ix <- seq_along(vec2) S_diff2 <- sapply(seq_len(N1-(N2-1)*ratio_sampling), \(j) sum((vec1[(ix-1)*ratio_sampling+j] - vec2[ix])**2))
On Sun, Jun 16, 2024 at 11:27?AM Laurent Rhelp <laurentRHelp at free.fr> wrote:
Dear RHelp-list,
I try to use the package comprehenr to replace a for loop by a list
comprehension.
I wrote the code but I certainly miss something because it is very
slower compared to the for loops. May you please explain to me why the
list comprehension is slower in my case.
Here is my example. I do the calculation of the square difference
between the values of two vectors vec1 and vec2, the ratio sampling
between vec1 and vec2 is equal to ratio_sampling. I have to use only the
500th value of the first serie before doing the difference with the
value of the second serie (vec2).
Thank you
Best regards
Laurent
library(tictoc)
library(comprehenr)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
## 1. with the "for" loops
## the square differences will be stored in a vector
S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
tic()
for( j in 1:length(S_diff2)){
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
vec2[i])**2)
}
S_diff2[j] <- sum_squares
}
toc()
## 0.22 sec elapsed
which.max(S_diff2)
## 7857
## 2. with the lists comprehension
tic()
S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
toc()
## 25.09 sec elapsed
which.max(S_diff2)
## 7857
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Thank you for this solution which is faster than the two for loops.
gloop <- function(N1,N2,ratio_sampling,vec1,vec2){
? ix <- seq_along(vec2)
? S_diff2 <- sapply(seq_len(N1-(N2-1)*ratio_sampling), \(j)
??????????????????? sum((vec1[(ix-1)*ratio_sampling+j] - vec2[ix])**2))
? return(S_diff2)
}
microbenchmark(
? S_diff2 <- dloop( N1, N2, ratio_sampling, vec1, vec2 )
? , S_diff3 <- vloop( N1, N2, ratio_sampling, vec1, vec2 )
? , S_diff4 <- gloop( N1, N2, ratio_sampling, vec1, vec2)
? , times = 20
)
Unit: milliseconds expr min lq mean median S_diff2 <- dloop(N1, N2,
ratio_sampling, vec1, vec2) 200.1107 218.10100 230.36871 222.3080
S_diff3 <- vloop(N1, N2, ratio_sampling, vec1, vec2) 42.6878 46.65425
73.83425 58.6626 S_diff4 <- gloop(N1, N2, ratio_sampling, vec1, vec2)
80.4895 91.24735 133.74064 110.9142 uq max neval cld 228.7683 303.8214
20 a 105.4257 145.2059 20 b 166.4094 233.7112 20 c
Le 16/06/2024 ? 20:54, Gabor Grothendieck a ?crit?:
This can be vectorized. Try ix <- seq_along(vec2) S_diff2 <- sapply(seq_len(N1-(N2-1)*ratio_sampling), \(j) sum((vec1[(ix-1)*ratio_sampling+j] - vec2[ix])**2)) On Sun, Jun 16, 2024 at 11:27?AM Laurent Rhelp<laurentRHelp at free.fr> wrote:
Dear RHelp-list,
I try to use the package comprehenr to replace a for loop by a list
comprehension.
I wrote the code but I certainly miss something because it is very
slower compared to the for loops. May you please explain to me why the
list comprehension is slower in my case.
Here is my example. I do the calculation of the square difference
between the values of two vectors vec1 and vec2, the ratio sampling
between vec1 and vec2 is equal to ratio_sampling. I have to use only the
500th value of the first serie before doing the difference with the
value of the second serie (vec2).
Thank you
Best regards
Laurent
library(tictoc)
library(comprehenr)
ratio_sampling <- 500
## size of the first serie
N1 <- 70000
## size of the second serie
N2 <- 100
## mock data
set.seed(123)
vec1 <- rnorm(N1)
vec2 <- runif(N2)
## 1. with the "for" loops
## the square differences will be stored in a vector
S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
tic()
for( j in 1:length(S_diff2)){
sum_squares <- 0
for( i in 1:length(vec2)){
sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
vec2[i])**2)
}
S_diff2[j] <- sum_squares
}
toc()
## 0.22 sec elapsed
which.max(S_diff2)
## 7857
## 2. with the lists comprehension
tic()
S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
toc()
## 25.09 sec elapsed
which.max(S_diff2)
## 7857
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.