Help with apply
Hi,
It seemed to me like the calculations were implemented in a bit of a
redundant way, so I tried to simplify it algebraically. doit1 is
Phil's version of your original calculations, and doit2 is my
simplified way.
doit1 <- function(i, j) {
exp(sum(log((exp(tmp[i,] * (qq$nodes[j]-b)))/
(factorial(tmp[i,])*exp(exp(qq$nodes[j]-b))))))*
((1/(s*sqrt(2*pi)))*exp(-((qq$nodes[j]-0)^2/(2*s^2))))/
dnorm(qq$nodes[j])*qq$weights[j]
}
doit2 <- function(i, j) {
(prod(exp((tmp[i,]*(qq$nodes[j]-b))-exp(qq$nodes[j]-b))/
factorial(tmp[i,]))*qq$weights[j])/
(s*sqrt(2*pi)*dnorm(qq$nodes[j])*exp((qq$nodes[j]-0)^2/(2*s^2)))
}
system.time(t(outer(1:300,1:2,Vectorize(doit1))))
system.time(t(outer(1:300,1:2,Vectorize(doit2))))
New vs. Old
Characters (for the calculations):
154 vs. 177
Function calls for calculations (i.e., excluding subseting/dnorm, etc.):
22 vs. 27
Timing difference:
Not appreciable on my system, but I spent so long wading through
exactly what was happening, I figured why not share it. Moving tmp up
to 300 rows, I got:
system.time(t(outer(1:300,1:2,Vectorize(doit1))))
user system elapsed 9.044 0.008 10.135
system.time(t(outer(1:300,1:2,Vectorize(doit2))))
user system elapsed 8.413 0.008 8.893 At this rate, the difference between a few million rows might save you enough time you can stretch your hands once, and those 20 characters should really free up space on your hard disk....sigh. Josh
On Mon, Oct 4, 2010 at 11:22 AM, Doran, Harold <HDoran at air.org> wrote:
Ahhhh, I see that you wrapped apply around mapply. I was toying with both; but didn't think to use mapply inside apply. As always, thank you, Phil -----Original Message----- From: Phil Spector [mailto:spector at stat.berkeley.edu] Sent: Monday, October 04, 2010 2:20 PM To: Doran, Harold Cc: Gabriela Cendoya; R-help Subject: Re: [R] Help with apply Harold - ? ?The first way that comes to mind is doit = function(i,j)exp(sum(log((exp(tmp[i,]*(qq$nodes[j]-b))) / ? ? ? ?(factorial(tmp[i,]) * exp(exp(qq$nodes[j]-b)))))) * ? ? ? ?((1/(s*sqrt(2*pi))) ?* exp(-((qq$nodes[j]-0)^2/(2*s^2))))/ ? ? ? ?dnorm(qq$nodes[j]) * qq$weights[j] t(outer(1:3,1:2,Vectorize(doit))) but you said you wanted to use apply. ?That leads to doit1 = function(tmpi,nod,wt) ? ? ? exp(sum(log((exp(tmpi*(nod-b))) / (factorial(tmpi) * ? ? ? exp(exp(nod-b)))))) * ((1/(s*sqrt(2*pi))) ?* ? ? ? exp(-((nod-0)^2/(2*s^2))))/dnorm(nod) * qq$weights[j] apply(tmp,1,function(x)mapply(function(n,w)doit1(x,n,w),qq$nodes,qq$weights)) Both seem to agree with your rr1. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Phil Spector ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Statistical Computing Facility ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Department of Statistics ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? UC Berkeley ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? spector at stat.berkeley.edu On Mon, 4 Oct 2010, Doran, Harold wrote:
Sorry about that; s <- 1 -----Original Message----- From: Gabriela Cendoya [mailto:gabrielacendoya.rlist at gmail.com] Sent: Monday, October 04, 2010 1:44 PM To: Doran, Harold Cc: R-help Subject: Re: [R] Help with apply You are missing "s" in your definitions so I can't reproduce your code.
tmp <- data.frame(var1 = sample(c(0:10), 3, replace = TRUE), var2 = sample(c(0:10), 3, replace = TRUE), var3 = sample(c(0:10), 3, replace = TRUE)) str(tmp)
'data.frame': ? 3 obs. of ?3 variables: $ var1: int ?9 3 9 $ var2: int ?4 6 2 $ var3: int ?2 9 3
#I can run the following double loop and yield what I want in the end (rr1) as:
library(statmod)
Q <- 2
b <- runif(3)
qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
rr1 <- matrix(0, nrow = Q, ncol = nrow(tmp))
L <- nrow(tmp)
for(j in 1:Q){
+ ? ? for(i in 1:L){
+ ? ? ? ? rr1[j,i] <- exp(sum(log((exp(tmp[i,]*(qq$nodes[j]-b))) /
(factorial(tmp[i,]) *
+ ? ? ? ? exp(exp(qq$nodes[j]-b)))))) * ((1/(s*sqrt(2*pi))) ?*
exp(-((qq$nodes[j]-0)^2/(2*s^2))))/dnorm(qq$nodes[j]) * qq$weights[j]
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
Error: objeto 's' no encontrado
rr1
? ? [,1] [,2] [,3] [1,] ? ?0 ? ?0 ? ?0 [2,] ? ?0 ? ?0 ? ?0
Gabriela 2010/10/4, Doran, Harold <HDoran at air.org>:
Suppose I have the following data:
tmp <- data.frame(var1 = sample(c(0:10), 3, replace = TRUE), var2 =
sample(c(0:10), 3, replace = TRUE), var3 = sample(c(0:10), 3, replace =
TRUE))
I can run the following double loop and yield what I want in the end (rr1)
as:
library(statmod)
Q <- 2
b <- runif(3)
qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
rr1 <- matrix(0, nrow = Q, ncol = nrow(tmp))
L <- nrow(tmp)
? ? ? ? ? ? ? ? for(j in 1:Q){
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? for(i in 1:L){
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rr1[j,i] <-
exp(sum(log((exp(tmp[i,]*(qq$nodes[j]-b))) / (factorial(tmp[i,]) *
exp(exp(qq$nodes[j]-b)))))) *
((1/(s*sqrt(2*pi))) ?* exp(-((qq$nodes[j]-0)^2/(2*s^2))))/dnorm(qq$nodes[j])
* qq$weights[j]
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }
rr1
But, I think this is so inefficient for large Q and nrow(tmp). The function
I am looping over is:
fn <- function(x, nodes, weights, s, b) {
? ? ? ? ? ? ? ? exp(sum(log((exp(x*(nodes-b))) / (factorial(x) *
exp(exp(nodes-b)))))) *
? ? ? ? ? ? ? ? ((1/(s*sqrt(2*pi))) ?*
exp(-((nodes-0)^2/(2*s^2))))/dnorm(nodes) * weights
? ? ? ? ? ? ? ? }
I've tried using apply in a few ways, but I can't replicate rr1 from the
double loop. I can go through each value of Q one step at a time and get
matching values in the rr1 table, but this would still require a loop and
that I think can be avoided.
apply(tmp, 1, fn, nodes = qq$nodes[1], weights = qq$weights[1], s = 1, b =
b)
Does anyone see an efficient way to compute rr1 without a loop?
Harold
sessionInfo()
R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 ?LC_CTYPE=English_United States.1252 ? ?LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ? ? ? LC_TIME=English_United States.1252 attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base other attached packages: [1] minqa_1.1.9 ? ? Rcpp_0.8.6 ? ? ?MiscPsycho_1.6 ?statmod_1.4.6 lattice_0.17-26 gdata_2.8.0 loaded via a namespace (and not attached): [1] grid_2.10.1 ?gtools_2.6.2 tools_2.10.1 ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--
_________________________ Lic. Mar?a Gabriela Cendoya Mag?ster en Biometr?a Profesor Adjunto Facultad de Ciencias Agrarias UNMdP - Argentina ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/