-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
Sent: Thursday, July 20, 2006 8:56 AM
To: Doran, Harold
Cc: r-help at stat.math.ethz.ch; Phil Spector
Subject: Re: Timing benefits of mapply() vs. for loop was:
[R] Wrap a loop inside a function
Note that if you use mapply in the way I suggested, which is
not the same as in your post, then its just as fast. (Also
the version of mapply in your post gives different numerical
results than
the for loop whereas mine gives the same.) like.mat is the for
loop version, like.mat2 is your mapply version and like.mat3
is my mapply version.
like.mat <- function(score, items, theta){
+ like.mat <- matrix(numeric(length(items) * length(theta)), ncol =
+ length(theta))
+ for(i in 1:length(items)) like.mat[i, ] <- pcm(theta, items[[i]],
score[[i]])
+ like.mat
+ }
system.time(for(i in 1:100) like.mat(score,items,theta))
like.mat2 <- function(score, items, theta)
+ matrix(mapply(pcm, rep(theta,length(items)), items, score),
+ ncol = length(theta), byrow = TRUE)
system.time(for(i in 1:100) like.mat2(score,items,theta))
all.equal(like.mat(score, items, theta), like.mat2(score, items,
theta))
[1] "Mean relative difference: 1.268095"
like.mat3 <- function(score, items, theta)
+ t(mapply(pcm, items, score, MoreArgs = list(theta = theta)))
system.time(for(i in 1:100) like.mat3(score,items,theta))
m3 <- like.mat3(score, items, theta)
dimnames(m3) <- NULL
all.equal(like.mat(score, items, theta), m3)
[1] TRUE
On 7/20/06, Doran, Harold <HDoran at air.org> wrote:
List:
Thank you for the replies to my post yesterday. Gabor and Phil also
gave useful replies on how to improve the function by relying on
mapply rather than the explicit for loop. In general, I try and use
the family of apply functions rather than the looping
as for, while etc as a matter of practice.
However, it seems the mapply function in this case is
of CPU speed) than the for loop. Here is an example.
# data needed for example
items <- list(item1 = c(0,1,2), item2 = c(0,1), item3 =
item4 = c(0,1), item5=c(0,1,2,3,4),
item6=c(0,1,2,3))
score <- c(2,1,3,1,3,2)
theta <- c(-1,-.5,0,.5,1)
# My old function using the for loop
like.mat <- function(score, items, theta){
like.mat <- matrix(numeric(length(items) * length(theta)), ncol =
length(theta))
for(i in 1:length(items)) like.mat[i, ] <- pcm(theta, items[[i]],
score[[i]])
like.mat
}
system.time(like.mat(score,items,theta))
[1] 0 0 0 NA NA
# Revised using mapply
like.mat <- function(score, items, theta){
matrix(mapply(pcm,rep(theta,length(items)),items,score),ncol=length(th
system.time(like.mat(score,items,theta))
[1] 0.03 0.00 0.03 NA NA
It is obviously slower to use mapply, but nominaly. So,
look at this within the context of the full program I am
For context, I am evaluating an integral using Gaussian quadrature.
This is a psychometric problem where the function 'pcm" is Master's
partial credit model and 'score' is the student's score on
i. When an item has two categories (0,1), pcm reduces to
the Rasch model for dichotomous data.
The dnorm is set at N(0,1) by default, but the parameters of the
population distribution are estimated from a separate procedure and
are normally input into the function, but this default
Here is the full program.
library(statmod)
# Master's partial credit model
pcm <- function(theta,d,score){
exp(rowSums(outer(theta,d[1:score],'-')))/
apply(exp(apply(outer(theta,d, '-'), 1, cumsum)), 2, sum)
}
like.mat <- function(score, items, theta){
like.mat <- matrix(numeric(length(items) * length(theta)), ncol =
length(theta))
for(i in 1:length(items)) like.mat[i, ] <- pcm(theta, items[[i]],
score[[i]])
like.mat
}
# turn this off for now
#like.mat <- function(score, items, theta){
#matrix(mapply(pcm,rep(theta,length(items)),items,score),ncol=length(t
heta),byrow=TRUE)
#}
class.numer <- function(score,items, prof_cut, mu=0,
gauss_numer <- gauss.quad(49,kind="laguerre")
if(aboveQ==FALSE){
mat <- rbind(like.mat(score,items,
(prof_cut-gauss_numer$nodes)), dnorm(prof_cut-gauss_numer$nodes,
mean=mu, sd=sigma))
} else { mat <- rbind(like.mat(score,items,
(gauss_numer$nodes+prof_cut)), dnorm(gauss_numer$nodes+prof_cut,
mean=mu,
sd=sigma))
}
f_y <- rbind(apply(mat, 2, prod), exp(gauss_numer$nodes),
gauss_numer$weights)
sum(apply(f_y,2,prod))
}
class.denom <- function(score,items, mu=0, sigma=1){
gauss_denom <- gauss.quad.prob(49, dist='normal', mu=mu,
mat <-
rbind(like.mat(score,items,gauss_denom$nodes),gauss_denom$weights)
sum(apply(mat, 2, prod))
}
class.acc <-function(score,items,prof_cut, mu=0, sigma=1,
aboveQ=TRUE){
result <- class.numer(score,items,prof_cut, mu,sigma,
aboveQ)/class.denom(score,items, mu, sigma)
return(result)
}
# Test the function "class.acc"
items <- list(item1 = c(0,1,2), item2 = c(0,1), item3 =
item4 = c(0,1), item5=c(0,1,2,3,4),
item6=c(0,1,2,3))
score <- c(2,1,3,1,3,2)
# This is the system time when using the for loop for the like.mat
function
system.time(class.acc(score,items,1,aboveQ=T))
[1] 0.04 0.00 0.04 NA NA
# This is the system time using the mapply for the like.mat function
system.time(class.acc(score,items,1,aboveQ=T))
[1] 0.70 0.00 0.73 NA NA
There is a substantial improvement in CPU seconds when the
applied rather than using the mapply function. I experimented with
adding more items and varying the quadrature points and it always
turned out the for loop was faster.
Given this result, I wonder what advice might be offered.
inherent reason one might opt for the mapply function (such as
reliability,
etc) even when it compromises computational speed? Or, should the
issue of computational speed be considered ahead of common
rely on the family of apply functions instead of the explicit loops.
Thanks for your consideration of my question, Harold
orm i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 3.0
year 2006
month 04
day 24
svn rev 37909
language R
version.string Version 2.3.0 (2006-04-24)