Hi all,
I wrote a function that actually does what I want it to do, but it tends to be very slow for large amount of data. On my computer it takes 5.37 seconds for 16000 data points and 21.95 seconds for 32000 data points. As my real data consists of 18000000 data points it would take ages to use the function as it is now.
Could someone help me to speed up the calculation?
Thank you, Tonja
system.time({
x <- runif(32000)
y <- runif(32000)
xy <- cbind(x,y)
outer <- function(z){
!any(x > z[1] & y > z[2])}
j <- apply(xy,1, outer)
plot(x,y)
points(x[j],y[j],col="green")
})
own function: computing time
7 messages · tonja.krueger at web.de, Rui Barradas, Jan van der Laan +2 more
Hello,
'outer' is a bad name for a function, it's already an R one. See ?outer.
As for your algorithm, it runs quadratically in the length of x and y so
you should expect a quadratic time behavior. What are you trying to do?
Your code gets max(x), max(y) and some other points near those. Can you
rethink what goes on before the algorithm?
Also, you're timing everything, it would be better to just
system.time({j <- apply(xy, 1, outer)})
Hope this helps,
Rui Barradas
Em 10-10-2012 11:15, tonja.krueger at web.de escreveu:
Hi all,
I wrote a function that actually does what I want it to do, but it tends to be very slow for large amount of data. On my computer it takes 5.37 seconds for 16000 data points and 21.95 seconds for 32000 data points. As my real data consists of 18000000 data points it would take ages to use the function as it is now.
Could someone help me to speed up the calculation?
Thank you, Tonja
system.time({
x <- runif(32000)
y <- runif(32000)
xy <- cbind(x,y)
outer <- function(z){
!any(x > z[1] & y > z[2])}
j <- apply(xy,1, outer)
plot(x,y)
points(x[j],y[j],col="green")
})
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Are the points you are looking for (those data points with no other data points above or to the right of them) a subset of the convex hull of the data points? If so, chull(x,y) can quickly give you the points on the convex hull (typically a fairly small number) and you can look through them for the ones you want. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of tonja.krueger at web.de
Sent: Wednesday, October 10, 2012 3:16 AM
To: r-help at r-project.org
Subject: [R] own function: computing time
Hi all,
I wrote a function that actually does what I want it to do, but it tends to be very slow for
large amount of data. On my computer it takes 5.37 seconds for 16000 data points and
21.95 seconds for 32000 data points. As my real data consists of 18000000 data points it
would take ages to use the function as it is now.
Could someone help me to speed up the calculation?
Thank you, Tonja
system.time({
x <- runif(32000)
y <- runif(32000)
xy <- cbind(x,y)
outer <- function(z){
!any(x > z[1] & y > z[2])}
j <- apply(xy,1, outer)
plot(x,y)
points(x[j],y[j],col="green")
})
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
No, the desired points are not a subset of the convex hull. E.g., x=c(0,1:5), y=c(0,1/(1:5)). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: William Dunlap Sent: Wednesday, October 10, 2012 9:46 AM To: 'tonja.krueger at web.de'; r-help at r-project.org Subject: RE: [R] own function: computing time Are the points you are looking for (those data points with no other data points above or to the right of them) a subset of the convex hull of the data points? If so, chull(x,y) can quickly give you the points on the convex hull (typically a fairly small number) and you can look through them for the ones you want. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of tonja.krueger at web.de Sent: Wednesday, October 10, 2012 3:16 AM To: r-help at r-project.org Subject: [R] own function: computing time Hi all, I wrote a function that actually does what I want it to do, but it tends to be very slow
for
large amount of data. On my computer it takes 5.37 seconds for 16000 data points and 21.95 seconds for 32000 data points. As my real data consists of 18000000 data points
it
would take ages to use the function as it is now.
Could someone help me to speed up the calculation?
Thank you, Tonja
system.time({
x <- runif(32000)
y <- runif(32000)
xy <- cbind(x,y)
outer <- function(z){
!any(x > z[1] & y > z[2])}
j <- apply(xy,1, outer)
plot(x,y)
points(x[j],y[j],col="green")
})
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Did not see a simple way to make it faster. However, this is a piece of
code which can be made to run much faster in C. See below.
I don't know if you are familiar with running c-code from R. If not, the
official documentation is in the R Extensions manual. However, this is
not the most easy documentation for a first read. If you want to use the
c-code and have problems getting it running, let me/us know your
operating system and I/we will try to walk you through it.
HTH,
Jan
=== c-code ===
void foo(double* m, int* pn, int* r) {
int n = *pn;
double* pm1 = m;
double* pm2 = m + n;
int* pr = r;
for (int i = 0; i < n; ++i, ++pm1, ++pm2, ++pr) {
*pr = 1;
double* qm1 = m;
double* qm2 = m + n;
for (int j = 0; j < n; ++j, ++qm1, ++qm2) {
if ((*qm1 > *pm1) && (*qm2 > *pm2)) {
*pr = 0;
break;
}
}
}
}
=== r-code ===
dyn.load("rtest.so")
foo <- function(m) {
n <- dim(m)[1]
.C("foo",
as.double(m),
as.integer(n),
r = logical(n))$r
}
x <- runif(32000)
y <- runif(32000)
xy <- cbind(x,y)
t1 <- system.time({
outer <- function(z){
!any(x > z[1] & y > z[2])
}
j <- apply(xy,1, outer)
})
t2 <- system.time({
j2 <- foo(xy)
})
=== results ===
> all(j == j2)
[1] TRUE
> t1
user system elapsed
35.462 0.028 35.549
> t2
user system elapsed
0.008 0.000 0.008
>
On 10/10/2012 12:15 PM, tonja.krueger at web.de wrote:
Hi all,
I wrote a function that actually does what I want it to do, but it tends to be very slow for large amount of data. On my computer it takes 5.37 seconds for 16000 data points and 21.95 seconds for 32000 data points. As my real data consists of 18000000 data points it would take ages to use the function as it is now.
Could someone help me to speed up the calculation?
Thank you, Tonja
system.time({
x <- runif(32000)
y <- runif(32000)
xy <- cbind(x,y)
outer <- function(z){
!any(x > z[1] & y > z[2])}
j <- apply(xy,1, outer)
plot(x,y)
points(x[j],y[j],col="green")
})
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Your original method would be the following function
f <- function (x, y)
{
xy <- cbind(x, y)
outside <- function(z) {
!any(x > z[1] & y > z[2])
}
j <- apply(xy, 1, outside)
which(j)
}
and the following one quickly computes the same thing as the above
as long as there are no repeated points (if there are repeated
points it chooses one of them).
f1 <- function (x, y)
{
o <- order(x, decreasing = TRUE)
yo <- y[o]
j <- logical(length(y))
j[o] <- yo == cummax(yo)
which(j)
}
Think of the problem as finding the "ladder points" (Feller's term)
of a sequence of points, the places where the sequence reaches
a new high point.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap Sent: Wednesday, October 10, 2012 9:52 AM To: tonja.krueger at web.de; r-help at r-project.org Subject: Re: [R] own function: computing time No, the desired points are not a subset of the convex hull. E.g., x=c(0,1:5), y=c(0,1/(1:5)). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: William Dunlap Sent: Wednesday, October 10, 2012 9:46 AM To: 'tonja.krueger at web.de'; r-help at r-project.org Subject: RE: [R] own function: computing time Are the points you are looking for (those data points with no other data points above or to the right of them) a subset of the convex hull of the data points? If so, chull(x,y) can quickly give you the points on the convex hull (typically a fairly small number) and you can look through them for the ones you want. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf
Of tonja.krueger at web.de Sent: Wednesday, October 10, 2012 3:16 AM To: r-help at r-project.org Subject: [R] own function: computing time Hi all, I wrote a function that actually does what I want it to do, but it tends to be very slow
for
large amount of data. On my computer it takes 5.37 seconds for 16000 data points
and
21.95 seconds for 32000 data points. As my real data consists of 18000000 data
points
it
would take ages to use the function as it is now.
Could someone help me to speed up the calculation?
Thank you, Tonja
system.time({
x <- runif(32000)
y <- runif(32000)
xy <- cbind(x,y)
outer <- function(z){
!any(x > z[1] & y > z[2])}
j <- apply(xy,1, outer)
plot(x,y)
points(x[j],y[j],col="green")
})
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
That's perfect, thanks a lot!
Tonja
Gesendet: Mittwoch, 10. Oktober 2012 um 21:37 Uhr
Von: "William Dunlap" <wdunlap at tibco.com>
An: "tonja.krueger at web.de" <tonja.krueger at web.de>, "r-help at r-project.org"
<r-help at r-project.org>
Betreff: RE: [R] own function: computing time
Your original method would be the following function
f <- function (x, y)
{
xy <- cbind(x, y)
outside <- function(z) {
!any(x > z[1] & y > z[2])
}
j <- apply(xy, 1, outside)
which(j)
}
and the following one quickly computes the same thing as the above
as long as there are no repeated points (if there are repeated
points it chooses one of them).
f1 <- function (x, y)
{
o <- order(x, decreasing = TRUE)
yo <- y[o]
j <- logical(length(y))
j[o] <- yo == cummax(yo)
which(j)
}
Think of the problem as finding the "ladder points" (Feller's term)
of a sequence of points, the places where the sequence reaches
a new high point.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf
> Of William Dunlap
> Sent: Wednesday, October 10, 2012 9:52 AM
> To: tonja.krueger at web.de; r-help at r-project.org
> Subject: Re: [R] own function: computing time
>
> No, the desired points are not a subset of the convex hull.
> E.g., x=c(0,1:5), y=c(0,1/(1:5)).
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: William Dunlap
> > Sent: Wednesday, October 10, 2012 9:46 AM
> > To: 'tonja.krueger at web.de'; r-help at r-project.org
> > Subject: RE: [R] own function: computing time
> >
> > Are the points you are looking for (those data points with no other data
> > points above or to the right of them) a subset of the convex hull of the
> > data points? If so, chull(x,y) can quickly give you the points on the
convex
> > hull (typically a fairly small number) and you can look through them for
> > the ones you want.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On
> Behalf
> > > Of tonja.krueger at web.de
> > > Sent: Wednesday, October 10, 2012 3:16 AM
> > > To: r-help at r-project.org
> > > Subject: [R] own function: computing time
> > >
> > > Hi all,
> > >
> > > I wrote a function that actually does what I want it to do, but it
tends to be very slow
> > for
> > > large amount of data. On my computer it takes 5.37 seconds for 16000
data points
> and
> > > 21.95 seconds for 32000 data points. As my real data consists of
18000000 data
> points
> > it
> > > would take ages to use the function as it is now.
> > > Could someone help me to speed up the calculation?
> > >
> > > Thank you, Tonja
> > >
> > > system.time({
> > > x <- runif(32000)
> > > y <- runif(32000)
> > >
> > > xy <- cbind(x,y)
> > >
> > > outer <- function(z){
> > > !any(x > z[1] & y > z[2])}
> > > j <- apply(xy,1, outer)
> > >
> > > plot(x,y)
> > > points(x[j],y[j],col="green")
> > >
> > > })
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > [1]https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
[2]http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> [3]https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
[4]http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
References
1. https://stat.ethz.ch/mailman/listinfo/r-help
2. http://www.R-project.org/posting-guide.html
3. https://stat.ethz.ch/mailman/listinfo/r-help
4. http://www.R-project.org/posting-guide.html