On Thu, Jul 19, 2012 at 5:02 PM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:
Johan,
Your 'list' and 'array doubling' code can be written much more efficient.
The following function is faster than your g and easier to read:
g2 <- function(dotot) {
v <- list()
for (i in seq_len(dotot)) {
v[[i]] <- FALSE
}
}
In the following line in you array doubling function
prev$l<-rbind(prev$l,matrix(**ncol=1,nrow=nextsize))
you first create a new array: the second half of your new array. Then
rbind creates a new array and has to copy the contents of both into this
new array. The following routine is much faster and almost scales linearly
(see below):
h2 <- function(dotot) {
n <- 1000L
v <- array(NA, dim=c(1, n))
for(i in seq_len(dotot)) {
if (i > n) {
n <- 2*n
v <- array(v, dim=c(1, n))
}
v[, i] <- TRUE
}
return(v[, seq_len(i)])
}
that's blazingly fast! thanks! I've also learned some nice optimization
tricks, like L, and seq.int, and the resizing with array...
given that this works, as you see, it's rather messy to use on a day-to-day
basis. my goal was next to hide to this in a couple of convenient functions
(emptyexpandlist and addexpandlist) so that this can be reused without
cluttering otherwise fine code. but the overhead of the function call, and
the tuple, seems to kill off the major advantage. that said, I present
these convenience functions below:
==========
emptyexpandlist2<-list(ne=0,l=array(NA, dim=c(1, 1000L)),len=1000L)
addexpandlist2<-function(x,prev){
if(prev$len==prev$ne){
n2<-prev$len*2
prev <- list(ne=prev$ne, l=array(prev$l, dim=c(1, n2)), len=n2)
}
prev$ne<-prev$ne+1
prev$l[prev$ne]<-x
return(prev)
}
compressexpandlist2<-function(prev){
return(prev$l[seq.int(prev$ne)])
}
h3<-function(dotot){
v<-emptyexpandlist2
for(i in 1:dotot){
v<-addexpandlist2(FALSE,v)
}
return(compressexpandlist2(v))
}
=======
I haven't checked the scaling but take it works as it should. the constant
factor is really bad though:
dotot=50000
user system elapsed
5.250 0.020 5.279
user system elapsed
2.650 0.060 2.713
user system elapsed
0.140 0.000 0.148
user system elapsed
2.480 0.020 2.495
still better than without the optimization though, and pretty much as
readable.
moral of the story: it seems possible to write fast R, but the code won't
look pretty
thanks for the answers!
Storing the data column wise makes it easier to increase the size of the
array.
As a reference for the timing I use the following routine in which I
assume I know the size of the end result.
ref <- function(dotot) {
v <- array(NA, dim=c(1, dotot))
for(i in seq_len(dotot)) {
v[, i] <- FALSE
}
return(v)
}
Timing the different routines:
dotot <- c(10, 100, 200, 500, 1000, 2000, 5000, 10000,
20000, 50000, 100000)
times <- array(NA, dim=c(length(dotot), 5))
i <- 1
for (n in dotot) {
cat(n, "\n")
times[i,1] <- system.time(f(n))[3]
#times[i,2] <- system.time(g(n))[3]
times[i,2] <- system.time(g2(n))[3]
times[i,3] <- system.time(h(n))[3]
times[i,4] <- system.time(h2(n))[3]
times[i,5] <- system.time(ref(n))[3]
i <- i + 1
}
[,1] [,2] [,3] [,4] [,5]
[1,] 0.000 0.000 0.000 0.001 0.000
[2,] 0.001 0.000 0.001 0.000 0.000
[3,] 0.001 0.000 0.002 0.000 0.000
[4,] 0.003 0.002 0.007 0.002 0.001
[5,] 0.009 0.006 0.013 0.002 0.003
[6,] 0.031 0.020 0.032 0.006 0.004
[7,] 0.181 0.099 0.098 0.016 0.010
[8,] 0.722 0.370 0.272 0.032 0.020
[9,] 2.897 1.502 0.766 0.066 0.044
[10,] 18.681 11.770 4.465 0.162 0.103
[11,] 77.757 57.960 17.912 0.322 0.215
The speed of the array doubling function is comparable to the function
where we know the size of the end result and scales almost linearly. (The
code is a bit messy however)
Jan
On 07/17/2012 10:58 PM, Johan Henriksson wrote:
thanks for the link! I should read it through. that said, I didn't find
any
good general solution to the problem so here I post some attempts for
general input. maybe someone knows how to speed this up. both my solutions
are theoretically O(n) for creating a list of n elements. The function to
improve is O(n^2) which should suck tremendously - but the slow execution
of R probably blows up the constant factor of the smarter solutions.
Array doubling comes close in speed for large lists but it would be great
if it could be comparable for smaller lists. One hidden cost I see
directly
is that allocating a list in R is O(n), not O(1) (or close), since it
always fills it with values. Is there a way around this? I guess by using
C, one could just malloc() and leave the content undefined - but is there
no better way?
thanks,
/Johan
##############################**##
# the function we wish to improve
f<-function(dotot){
v<-matrix(ncol=1,nrow=0)
for(i in 1:dotot){
v<-rbind(v,FALSE)
}
return(v)
}
##########################
# first attempt: linked lists
emptylist <- NA
addtolist <- function(x,prev){
return(list(x,prev))
}
g<-function(dotot){
v<-emptylist
for(i in 1:dotot){
v<-addtolist(FALSE,v)
}
return(v)
}
##############################**######
# second attempt: array doubling
emptyexpandlist<-list(nelem=0,**l=matrix(ncol=1,nrow=0))
addexpandlist<-function(x,**prev){
if(nrow(prev$l)==prev$nelem){
nextsize<-max(nrow(prev$l),1)
prev$l<-rbind(prev$l,matrix(**ncol=1,nrow=nextsize))
}
prev$nelem<-prev$nelem+1
prev$l[prev$nelem]<-x
return(prev)
}
compressexpandlist<-function(**prev){
return(as.vector(prev$l[1:**prev$nelem]))
}
h<-function(dotot){
v<-emptyexpandlist
for(i in 1:dotot){
v<-addexpandlist(FALSE,v)
}
return(compressexpandlist(v))
}
##############################**###########
dotot=100000
system.time(f(dotot))
#system.time(g(dotot))
system.time(h(dotot))
On Tue, Jul 17, 2012 at 8:42 PM, Patrick Burns <pburns at pburns.seanet.com>
**wrote:
Johan,
If you gave a simple example of how your code
looks now and what you want it to do, then you
might get some ideas of how to improve it.
Pat
On 17/07/2012 12:47, Johan Henriksson wrote:
Hello!
I am optimizing my code in R and for this I need to know a bit more
about
the internals. It would help tremendously if someone could link me to a
page with O()-complexities of all the operations.
In this particular case, I need something like a linked list with O(1)
insertLast/First ability. I can't preallocate a vector since I do not
know
the final size of the list ahead of time.
The classic array-doubling trick would give me O(1) amortized time but
it's
a bit messy. The other alternative I see would be to recursively store
lists (elem, (elem, (elem, (...)))), which I take also would work? But
I'd
rather go for a standard R solution if there is one!
cheers,
/Johan
--