Skip to content

Vectorizing integrate()

15 messages · Doran, Harold, David Winsemius, David L Carlson +4 more

#
On Dec 6, 2012, at 10:10 AM, Doran, Harold wrote:

            
The Vectorize function is just a wrapper to mapply. If yoou are able to get that code to execute properly for your un-posted test cases, then why not use mapply?
David Winsemius, MD
Alameda, CA, USA
#
You can't vectorize the loop because integrate() is not vectorized, but you
really need to pass only one index variable if you redefine your function so
that you are not re-creating it for each step in the loop. That would let
you use sapply(). Here's a toy example that should generalize to your
problem:
n=w[i])$value)
[1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000

Creating the function outside the loop should speed things up. 

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
David et al

Thanks, I should have made the post more complete. I routinely use apply functions, but often avoid mapply() as I find it so non-intuitive. In this instance, I think the situation demands I change that position, though.

Reproducible code for the current implementation of the function is

B <- c(0,1)
sem1 = runif(10, 1, 2)
x <- rnorm(10)
X <- cbind(1, x)
eta <- numeric(10)

for(j in 1:nrow(X)){
	fun <- function(u) 1/ (1 + exp(- (B[1] + B[2] * (x[j] + u)))) * dnorm(u, 0, sem1[j])
	eta[j] <- integrate(fun, -Inf, Inf)$value
}

I can't get my head around how mapply() would work here. It accepts as its first argument a function. But, in my case I have two functions: the user defined integrand, fun(), an then of course calling the R function integrate().

I was thinking maybe along these lines, but this is obviously wrong.

mapply(integrate(function(u) 1/ (1 + exp(- (B[1] + B[2] * (x + u)))) * dnorm(u, 0, sem1), -Inf, Inf)$value, MoreArgs = list(B, x, sem1))
#
How about?
+      (m + u)))) * dnorm(u, 0, s)
+      -Inf, Inf, m=x[i], s=sem1[i])$value)
[1] 0.3018974 0.6780813 0.4804123 0.4845773 0.3886028 0.3336432 0.2541276
 [8] 0.6894951 0.6649838 0.2385417

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
Hi,
Using David's function:
fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] *
????? (m + u)))) * dnorm(u, 0, s)

?res<-mapply(function(i) integrate(fun,-Inf,Inf,m=x[i],s=sem1[i])$value,1:nrow(X))
res
# [1] 0.5212356 0.6214989 0.5306124 0.5789414 0.3429795 0.6972879 0.5952949
?#[8] 0.7531899 0.4740685 0.7576587
identical(res,eta)
#[1] TRUE
A.K.

----- Original Message -----
From: "Doran, Harold" <HDoran at air.org>
To: David Winsemius <dwinsemius at comcast.net>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Friday, December 7, 2012 10:14 AM
Subject: Re: [R] Vectorizing integrate()

David et al

Thanks, I should have made the post more complete. I routinely use apply functions, but often avoid mapply() as I find it so non-intuitive. In this instance, I think the situation demands I change that position, though.

Reproducible code for the current implementation of the function is

B <- c(0,1)
sem1 = runif(10, 1, 2)
x <- rnorm(10)
X <- cbind(1, x)
eta <- numeric(10)

for(j in 1:nrow(X)){
??? fun <- function(u) 1/ (1 + exp(- (B[1] + B[2] * (x[j] + u)))) * dnorm(u, 0, sem1[j])
??? eta[j] <- integrate(fun, -Inf, Inf)$value
}

I can't get my head around how mapply() would work here. It accepts as its first argument a function. But, in my case I have two functions: the user defined integrand, fun(), an then of course calling the R function integrate().

I was thinking maybe along these lines, but this is obviously wrong.

mapply(integrate(function(u) 1/ (1 + exp(- (B[1] + B[2] * (x + u)))) * dnorm(u, 0, sem1), -Inf, Inf)$value, MoreArgs = list(B, x, sem1))
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Must be a cut and paste issue. All three agree on the results but they are
different from those in arun's message:
+ fun <- function(u) 1/ (1 + exp(- (B[1] + B[2] * (x[j] + u)))) * dnorm(u,
0, sem1[j])
+ eta[j] <- integrate(fun, -Inf, Inf)$value
+ }
[1] 0.6586459 0.5565611 0.5226462 0.6824012 0.6908079 0.7578524 0.4715976
 [8] 0.2165210 0.2378657 0.3492133
+       (m + u)))) * dnorm(u, 0, s)
+       -Inf, Inf, m=x[i], s=sem1[i])$value)
[1] 0.6586459 0.5565611 0.5226462 0.6824012 0.6908079 0.7578524 0.4715976
 [8] 0.2165210 0.2378657 0.3492133
[1] TRUE
integrate(fun,-Inf,Inf,m=x[i],s=sem1[i])$value,1:nrow(X))
[1] 0.6586459 0.5565611 0.5226462 0.6824012 0.6908079 0.7578524 0.4715976
 [8] 0.2165210 0.2378657 0.3492133
[1] TRUE

-------
David C
#
HI,

I can see ?runif(), ?rnorm() without a set seed.
A.K.

----- Original Message -----
From: David L Carlson <dcarlson at tamu.edu>
To: 'arun' <smartpink111 at yahoo.com>; "'Doran, Harold'" <HDoran at air.org>
Cc: 'R help' <r-help at r-project.org>; 'David Winsemius' <dwinsemius at comcast.net>
Sent: Friday, December 7, 2012 11:54 AM
Subject: RE: [R] Vectorizing integrate()

Must be a cut and paste issue. All three agree on the results but they are
different from those in arun's message:
+ fun <- function(u) 1/ (1 + exp(- (B[1] + B[2] * (x[j] + u)))) * dnorm(u,
0, sem1[j])
+ eta[j] <- integrate(fun, -Inf, Inf)$value
+ }
[1] 0.6586459 0.5565611 0.5226462 0.6824012 0.6908079 0.7578524 0.4715976
[8] 0.2165210 0.2378657 0.3492133
+? ? ?  (m + u)))) * dnorm(u, 0, s)
+? ? ?  -Inf, Inf, m=x[i], s=sem1[i])$value)
[1] 0.6586459 0.5565611 0.5226462 0.6824012 0.6908079 0.7578524 0.4715976
[8] 0.2165210 0.2378657 0.3492133
[1] TRUE
integrate(fun,-Inf,Inf,m=x[i],s=sem1[i])$value,1:nrow(X))
[1] 0.6586459 0.5565611 0.5226462 0.6824012 0.6908079 0.7578524 0.4715976
[8] 0.2165210 0.2378657 0.3492133
[1] TRUE

-------
David C
#
On Dec 7, 2012, at 7:14 AM, Doran, Harold wrote:

            
I had been thinking (before you presented the data case and allowed  
the problem to be seen more fully)  that X[ n, ] was being passed to  
that expression. It's not. You are using the index of one object to  
pass positions of a vector, so there is really only one value. Using  
mapply would reduce to using sapply as Carlson illustrated.
#
Has anyone suggested using the byte code compiler "compiler" 
package?  An analysis by John Nash suggested to me that it may be 
roughly equivalent to vectorization;  see 
"http://rwiki.sciviews.org/doku.php?id=tips:rqcasestudy&s=compiler".


       If that has been suggested, please excuse me for entering this 
thread late without reading all the previous posts.


       Spencer
On 12/7/2012 8:54 AM, David L Carlson wrote:

  
    
#
On 07-12-2012, at 18:12, Spencer Graves wrote:

            
Not yet.
But here are some results for alternative ways of doing what  the OP wanted.


# Initial parameters

N <- 1000
B <- c(0,1)
sem1 <- runif(N, 1, 2)
x <- rnorm(N)
X <- cbind(1, x)

# load compiler package

library(compiler)

# functions

# Original loop solution with function fun defined outside loop

f1 <- function(X, B, x, sem1) {
    eta <- numeric(nrow(X))
    fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
    for(j in 1:nrow(X)){
    	eta[j] <- integrate(fun, -Inf, Inf, m=x[j], s=sem1[j])$value
    }
    eta
}
f2 <- cmpfun(f1)

# sapply solution with fun defined outside function

fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)

f3 <- function(X, B, x, sem1) sapply(1:nrow(X), function(i) integrate(fun, -Inf, Inf, m=x[i], s=sem1[i])$value)
f4 <- cmpfun(f3)

# sapply solution with fun defined within function

f5 <- function(X, B, x, sem1) {
    fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
    sapply(1:nrow(X), function(i) integrate(fun, -Inf, Inf, m=x[i], s=sem1[i])$value)
}
f6 <- cmpfun(f5)

# Testing

eta1 <- f1(X, B, x, sem1)
eta2 <- f2(X, B, x, sem1)
eta3 <- f3(X, B, x, sem1)
eta4 <- f4(X, B, x, sem1)
eta5 <- f5(X, B, x, sem1)
eta6 <- f6(X, B, x, sem1)

identical(eta1,eta2)
identical(eta1,eta3)
identical(eta1,eta4)
identical(eta1,eta5)

library(rbenchmark)

benchmark(eta1 <- f1(X, B, x, sem1), eta2 <- f2(X, B, x, sem1), eta3 <- f3(X, B, x, sem1),
          eta4 <- f4(X, B, x, sem1), eta5 <- f5(X, B, x, sem1), eta6 <- f6(X, B, x, sem1),
          replications=10, columns=c("test","elapsed","relative"))


# Results
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
+           eta4 <- f4(X, B, x, sem1), eta5 <- f5(X, B, x, sem1), eta6 <- f6(X, B, x, sem1),
+           replications=10, columns=c("test","elapsed","relative"))
                       test elapsed relative
1 eta1 <- f1(X, B, x, sem1)   1.873    1.207
2 eta2 <- f2(X, B, x, sem1)   1.552    1.000
3 eta3 <- f3(X, B, x, sem1)   1.807    1.164
4 eta4 <- f4(X, B, x, sem1)   1.841    1.186
5 eta5 <- f5(X, B, x, sem1)   1.852    1.193
6 eta6 <- f6(X, B, x, sem1)   1.601    1.032

As you can see using the compiler package is beneficial speedwise.
f2 and f6, both the the result of using the compiler package, are the quickest.
It's quite likely that more can be eked out of this.

Berend
#
On 12/7/2012 9:40 AM, Berend Hasselman wrote:
So the compiler (f2, f4, f6) provided a slight improvement over 
f1 and f3 but not f2, and in any event, the improvement was not great.


       Spencer
#
On 07-12-2012, at 19:37, Spencer Graves wrote:

            
I don't understand the "but not f2".
And I don't understand the conclusion for (f2,f4,f6). f4 is a compiled version of f3 and is slower than its non compiled version.
f2 and f6 are the quickest compiled versions.
Indeed the improvement is not earth shattering but it does demonstrate what you can achieve by using the compiler package.

Berend
#
I found mixed (and not always easy to predict) results from the 
byte-code compiler. It seems necessary to test whether it helps. On some 
calculations, it is definitely worthwhile.

JN
On 12-12-07 01:57 PM, Berend Hasselman wrote:
#
On 07-12-2012, at 18:01, arun wrote:

            
Good eyesight! Correct.
Oops.

Inserting set.seed(413) at the start I now get this

# > benchmark(eta1 <- f1(X, B, x, sem1), eta2 <- f2(X, B, x, sem1), eta3 <- f3(X, B, x, sem1),
# +           eta4 <- f4(X, B, x, sem1), eta5 <- f5(X, B, x, sem1), eta6 <- f6(X, B, x, sem1),
# +           replications=10, columns=c("test","elapsed","relative"))
#                        test elapsed relative
# 1 eta1 <- f1(X, B, x, sem1)   2.248    1.392
# 2 eta2 <- f2(X, B, x, sem1)   1.738    1.076
# 3 eta3 <- f3(X, B, x, sem1)   2.024    1.253
# 4 eta4 <- f4(X, B, x, sem1)   2.072    1.283
# 5 eta5 <- f5(X, B, x, sem1)   2.038    1.262
# 6 eta6 <- f6(X, B, x, sem1)   1.615    1.000

Relative conclusions stay the same. The compiled versions f2, f6 are the quickest.

Berend