Dear R-ophiles,
I've found something very odd when I apply convolve
to ever larger vectors. Here is an example below
with vectors ranging from 2^11 to 2^17. There is
a funny bump up at 2^12. Then it gets very slow at 2^16.
> for( i in 11:20 )print( system.time(convolve(1:2^i,1:2^i,type="o")))
user system elapsed
0.002 0.000 0.002
user system elapsed
0.373 0.002 0.375
user system elapsed
0.014 0.001 0.016
user system elapsed
0.031 0.002 0.034
user system elapsed
0.126 0.004 0.130
user system elapsed
194.095 0.013 194.185
user system elapsed
0.345 0.011 0.356
This example is run on a fedora machine with 64 bits. I hit the same
wall at 2^16 on a Macbook (Intel processor I think). The fedora machine
is running R 2.5.0. That's a bit old (April 07) but I saw no mention of
this speed
problem in some web searching, and it's not mentioned in the 2.6
what's new notes.
I've rerun it and found the same bump at 12 and wall at 16.
The timing at 2^16 can change appreciably. In one
other case it was about 270 user, 271 elapsed.
The 2^18 case ran for hours without ever finishing.
At first I thought that this was a memory latency issue. Maybe it
is. But that makes it hard to explain why 2^17 works better than
2^16. I've seen that three times now, so I'm almost ready to call it
reproducible.
Also, one of the machines I'm using has lots of memory. Maybe it's
a cache issue ... but that still does not explain why 2^12 is slower
than 2^13 or 2^16 is slower than 2^17.
I've checked by running convolve without type="o" and I don't
see the wall. Similarly fft does not have that problem.
Here's an example without type="open"
> for( k in 11:20)print(system.time( convolve( 1:2^k,1:2^k)))
user system elapsed
0.001 0.000 0.000
user system elapsed
0.001 0.000 0.001
user system elapsed
0.002 0.000 0.002
user system elapsed
0.004 0.000 0.004
user system elapsed
0.009 0.001 0.010
user system elapsed
0.017 0.001 0.018
user system elapsed
0.138 0.005 0.143
user system elapsed
0.368 0.012 0.389
user system elapsed
1.010 0.032 1.051
user system elapsed
1.945 0.069 2.015
This is more what I expected. Something like N or N log(N) , with
the difference hard to discern in granularity and noise.
The convolve function is not very big (see below). When type is
not specified, it defaults to "circular". So my guess is that something
mysterious might be happening inside the first else clause below,
at least on some architectures.
-Art Owen
> convolve
function (x, y, conj = TRUE, type = c("circular", "open", "filter"))
{
type <- match.arg(type)
n <- length(x)
ny <- length(y)
Real <- is.numeric(x) && is.numeric(y)
if (type == "circular") {
if (ny != n)
stop("length mismatch in convolution")
}
else {
n1 <- ny - 1
x <- c(rep.int(0, n1), x)
n <- length(y <- c(y, rep.int(0, n - 1)))
}
x <- fft(fft(x) * (if (conj)
Conj(fft(y))
else fft(y)), inv = TRUE)
if (type == "filter")
(if (Real)
Re(x)
else x)[-c(1:n1, (n - n1 + 1):n)]/n
else (if (Real)
Re(x)
else x)/n
}
strange timings in convolve(x,y,type="open")
3 messages · Art Owen, Charles C. Berry, owen at stanford.edu
On Tue, 18 Dec 2007, Art Owen wrote:
Dear R-ophiles, I've found something very odd when I apply convolve to ever larger vectors. Here is an example below with vectors ranging from 2^11 to 2^17. There is a funny bump up at 2^12. Then it gets very slow at 2^16.
The time is consumed by fft, which is called with vectors of length 2*2^i-1 when type = 'o'
system.time( fft(seq_len(2^13-1)) )
user system elapsed
0.31 0.00 0.32
system.time( fft(seq_len(2^14-1)) )
user system elapsed
0 0 0
There are no factors of 2^13-1 or 2^17-1 or 2^18-1
for (i in 11:20 ) print( c(index=i, nfact=length(which( 0 ==
(2^(i+1)-1)%%(2:trunc(sqrt(2^(i+1)-1)) )))))
index nfact
11 11
index nfact
12 0
index nfact
13 3
index nfact
14 3
index nfact
15 7
index nfact
16 0
index nfact
17 15
index nfact
18 0
index nfact
19 23
index nfact
20 5
It looks like the code in fft.c tries to find factors of the series length and works from there. So, the size of the problem evidently depends on its factors. HTH, Chuck
for( i in 11:20 )print( system.time(convolve(1:2^i,1:2^i,type="o")))
user system elapsed 0.002 0.000 0.002 user system elapsed 0.373 0.002 0.375 user system elapsed 0.014 0.001 0.016 user system elapsed 0.031 0.002 0.034 user system elapsed 0.126 0.004 0.130 user system elapsed 194.095 0.013 194.185 user system elapsed 0.345 0.011 0.356 This example is run on a fedora machine with 64 bits. I hit the same wall at 2^16 on a Macbook (Intel processor I think). The fedora machine is running R 2.5.0. That's a bit old (April 07) but I saw no mention of this speed problem in some web searching, and it's not mentioned in the 2.6 what's new notes. I've rerun it and found the same bump at 12 and wall at 16. The timing at 2^16 can change appreciably. In one other case it was about 270 user, 271 elapsed. The 2^18 case ran for hours without ever finishing. At first I thought that this was a memory latency issue. Maybe it is. But that makes it hard to explain why 2^17 works better than 2^16. I've seen that three times now, so I'm almost ready to call it reproducible. Also, one of the machines I'm using has lots of memory. Maybe it's a cache issue ... but that still does not explain why 2^12 is slower than 2^13 or 2^16 is slower than 2^17. I've checked by running convolve without type="o" and I don't see the wall. Similarly fft does not have that problem. Here's an example without type="open"
for( k in 11:20)print(system.time( convolve( 1:2^k,1:2^k)))
user system elapsed 0.001 0.000 0.000 user system elapsed 0.001 0.000 0.001 user system elapsed 0.002 0.000 0.002 user system elapsed 0.004 0.000 0.004 user system elapsed 0.009 0.001 0.010 user system elapsed 0.017 0.001 0.018 user system elapsed 0.138 0.005 0.143 user system elapsed 0.368 0.012 0.389 user system elapsed 1.010 0.032 1.051 user system elapsed 1.945 0.069 2.015 This is more what I expected. Something like N or N log(N) , with the difference hard to discern in granularity and noise. The convolve function is not very big (see below). When type is not specified, it defaults to "circular". So my guess is that something mysterious might be happening inside the first else clause below, at least on some architectures. -Art Owen
convolve
function (x, y, conj = TRUE, type = c("circular", "open", "filter"))
{
type <- match.arg(type)
n <- length(x)
ny <- length(y)
Real <- is.numeric(x) && is.numeric(y)
if (type == "circular") {
if (ny != n)
stop("length mismatch in convolution")
}
else {
n1 <- ny - 1
x <- c(rep.int(0, n1), x)
n <- length(y <- c(y, rep.int(0, n - 1)))
}
x <- fft(fft(x) * (if (conj)
Conj(fft(y))
else fft(y)), inv = TRUE)
if (type == "filter")
(if (Real)
Re(x)
else x)[-c(1:n1, (n - n1 + 1):n)]/n
else (if (Real)
Re(x)
else x)/n
}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Thanks Charles. That must be it. (Berwin also noticed this.) When convolve hit the wall, I switched over to FFTW in C. That is actually pretty nice code which runs in n log(n) even for prime n and takes special account of factors of n up to about 19 or so. So if the R team ever wants to put in a new FFT, that looks like a good one. But I think easier fix for me, or others in this boat, would just be to write a new convolve(x,y) that pads x and y with zeros out to length 2*max( length(x), length(y) ). Then if x and y have very composite lengths, especially powers of 2, the fft should go quickly. -Art Quoting "Charles C. Berry" <cberry at tajo.ucsd.edu>:
On Tue, 18 Dec 2007, Art Owen wrote:
Dear R-ophiles, I've found something very odd when I apply convolve to ever larger vectors. Here is an example below with vectors ranging from 2^11 to 2^17. There is a funny bump up at 2^12. Then it gets very slow at 2^16.
The time is consumed by fft, which is called with vectors of length 2*2^i-1 when type = 'o'
system.time( fft(seq_len(2^13-1)) )
user system elapsed 0.31 0.00 0.32
system.time( fft(seq_len(2^14-1)) )
user system elapsed
0 0 0
There are no factors of 2^13-1 or 2^17-1 or 2^18-1
for (i in 11:20 ) print( c(index=i, nfact=length(which( 0 ==
(2^(i+1)-1)%%(2:trunc(sqrt(2^(i+1)-1)) ))))) index nfact 11 11 index nfact 12 0 index nfact 13 3 index nfact 14 3 index nfact 15 7 index nfact 16 0 index nfact 17 15 index nfact 18 0 index nfact 19 23 index nfact 20 5
It looks like the code in fft.c tries to find factors of the series length and works from there. So, the size of the problem evidently depends on its factors. HTH, Chuck
for( i in 11:20 )print( system.time(convolve(1:2^i,1:2^i,type="o")))
user system elapsed 0.002 0.000 0.002 user system elapsed 0.373 0.002 0.375 user system elapsed 0.014 0.001 0.016 user system elapsed 0.031 0.002 0.034 user system elapsed 0.126 0.004 0.130 user system elapsed 194.095 0.013 194.185 user system elapsed 0.345 0.011 0.356 This example is run on a fedora machine with 64 bits. I hit the same wall at 2^16 on a Macbook (Intel processor I think). The fedora machine is running R 2.5.0. That's a bit old (April 07) but I saw no mention of this speed problem in some web searching, and it's not mentioned in the 2.6 what's new notes. I've rerun it and found the same bump at 12 and wall at 16. The timing at 2^16 can change appreciably. In one other case it was about 270 user, 271 elapsed. The 2^18 case ran for hours without ever finishing. At first I thought that this was a memory latency issue. Maybe it is. But that makes it hard to explain why 2^17 works better than 2^16. I've seen that three times now, so I'm almost ready to call it reproducible. Also, one of the machines I'm using has lots of memory. Maybe it's a cache issue ... but that still does not explain why 2^12 is slower than 2^13 or 2^16 is slower than 2^17. I've checked by running convolve without type="o" and I don't see the wall. Similarly fft does not have that problem. Here's an example without type="open"
for( k in 11:20)print(system.time( convolve( 1:2^k,1:2^k)))
user system elapsed 0.001 0.000 0.000 user system elapsed 0.001 0.000 0.001 user system elapsed 0.002 0.000 0.002 user system elapsed 0.004 0.000 0.004 user system elapsed 0.009 0.001 0.010 user system elapsed 0.017 0.001 0.018 user system elapsed 0.138 0.005 0.143 user system elapsed 0.368 0.012 0.389 user system elapsed 1.010 0.032 1.051 user system elapsed 1.945 0.069 2.015 This is more what I expected. Something like N or N log(N) , with the difference hard to discern in granularity and noise. The convolve function is not very big (see below). When type is not specified, it defaults to "circular". So my guess is that something mysterious might be happening inside the first else clause below, at least on some architectures. -Art Owen
convolve
function (x, y, conj = TRUE, type = c("circular", "open", "filter"))
{
type <- match.arg(type)
n <- length(x)
ny <- length(y)
Real <- is.numeric(x) && is.numeric(y)
if (type == "circular") {
if (ny != n)
stop("length mismatch in convolution")
}
else {
n1 <- ny - 1
x <- c(rep.int(0, n1), x)
n <- length(y <- c(y, rep.int(0, n - 1)))
}
x <- fft(fft(x) * (if (conj)
Conj(fft(y))
else fft(y)), inv = TRUE)
if (type == "filter")
(if (Real)
Re(x)
else x)[-c(1:n1, (n - n1 + 1):n)]/n
else (if (Real)
Re(x)
else x)/n
}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of
Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901