Dear bioc-devel,
the dist2 function in genefilter defined as:
dist2 <- function (x, fun = function(a, b) mean(abs(a - b), na.rm =
TRUE), diagonal = 0) {
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
i] = fun(x[, i], x[, j])
}
return(res)
}
could have it's default function vectorized as:
res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
to improve performance for example in the ArrayQualityMetrics package.
Best.
James.
[Bioc-devel] vectorize default dist2 function in genefilter
5 messages · James F. Reid, Wolfgang Huber
Dear James Thank you. What would the saved time be (e.g. compared to the overall runtime of arrayQualityMetrics)? I would be surprised if the saving was worth the added complexity, but am always happy to be surprised. A patch of the .R and .Rd file would be most welcome and expedite the change. Btw, colSums apparently also works with 3-dim arrays, so both loops (over i and j) could be vectorised, however afaIcs at the cost of constructing an object of size nrow(x)^3 in memory, which might again break performance. Best wishes Wolfgang Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid <reidjf at gmail.com> ha scritto:
Dear bioc-devel,
the dist2 function in genefilter defined as:
dist2 <- function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), diagonal = 0) {
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
i] = fun(x[, i], x[, j])
}
return(res)
}
could have it's default function vectorized as:
res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
to improve performance for example in the ArrayQualityMetrics package.
Best.
James.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Wolfgang,
On 12/03/13 21:11, Wolfgang Huber wrote:
Dear James Thank you. What would the saved time be (e.g. compared to the overall runtime of arrayQualityMetrics)? I would be surprised if the saving was worth the added complexity, but am always happy to be surprised.
I believe so but maybe I'm missing something.
Here are the system times on a 20000*100 on my laptop using just dist2,
which would also have a consequence on the overall runtime of
arrayQualityMetrics in aqm.heamap.
dist2 <- function (x,
fun = function(a, b) mean(abs(a - b), na.rm = TRUE),
diagonal = 0) {
defaultFun <- ifelse(missing(fun), TRUE, FALSE)
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
if (defaultFun) {
res <- apply(x, 2, function(i) colMeans(abs(x - i),
na.rm=TRUE))
} else {
for (j in 2:ncol(x))
for (i in 1:(j - 1))
res[i, j] = res[j, i] = fun(x[, i], x[, j])
}
}
return(res)
}
y <- matrix(rnorm(10000 * 100), 20000, 100)
system.time(dist2(x = y, fun = function(a, b) mean(abs(a - b), na.rm=TRUE)))
## user system elapsed
## 12.272 0.136 12.518
system.time(dist2(x = y))
## user system elapsed
## 5.045 0.728 5.839
Not sure what you'd want to change to the Rd.
Best,
James.
A patch of the .R and .Rd file would be most welcome and expedite the change. Btw, colSums apparently also works with 3-dim arrays, so both loops (over i and j) could be vectorised, however afaIcs at the cost of constructing an object of size nrow(x)^3 in memory, which might again break performance. Best wishes Wolfgang Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid <reidjf at gmail.com> ha scritto:
Dear bioc-devel,
the dist2 function in genefilter defined as:
dist2 <- function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), diagonal = 0) {
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
i] = fun(x[, i], x[, j])
}
return(res)
}
could have it's default function vectorized as:
res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
to improve performance for example in the ArrayQualityMetrics package.
Best.
James.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Wolfgang,
On 12/03/13 21:11, Wolfgang Huber wrote:
Dear James Thank you. What would the saved time be (e.g. compared to the overall runtime of arrayQualityMetrics)? I would be surprised if the saving was worth the added complexity, but am always happy to be surprised.
I believe so but maybe I'm missing something.
Here are the system times on a 20000*100 on my laptop using just dist2,
which would also have a consequence on the overall runtime of
arrayQualityMetrics in aqm.heamap.
dist2 <- function (x,
fun = function(a, b) mean(abs(a - b), na.rm = TRUE),
diagonal = 0) {
defaultFun <- ifelse(missing(fun), TRUE, FALSE)
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
if (defaultFun) {
res <- apply(x, 2, function(i) colMeans(abs(x - i),
na.rm=TRUE))
} else {
for (j in 2:ncol(x))
for (i in 1:(j - 1))
res[i, j] = res[j, i] = fun(x[, i], x[, j])
}
}
return(res)
}
y <- matrix(rnorm(20000 * 100), 20000, 100)
system.time(dist2(x = y, fun = function(a, b) mean(abs(a - b), na.rm=TRUE)))
## user system elapsed
## 11.664 0.060 11.800
system.time(dist2(x = y))
## user system elapsed
## 5.201 0.348 5.600
Not sure what you'd want to change to the Rd.
Best,
James.
A patch of the .R and .Rd file would be most welcome and expedite the change. Btw, colSums apparently also works with 3-dim arrays, so both loops (over i and j) could be vectorised, however afaIcs at the cost of constructing an object of size nrow(x)^3 in memory, which might again break performance. Best wishes Wolfgang Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid <reidjf at gmail.com> ha scritto:
Dear bioc-devel,
the dist2 function in genefilter defined as:
dist2 <- function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), diagonal = 0) {
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
i] = fun(x[, i], x[, j])
}
return(res)
}
could have it's default function vectorized as:
res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
to improve performance for example in the ArrayQualityMetrics package.
Best.
James.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
1 day later
Hi James I added your suggestion to the package, it is in svn revision 74349 and should be on the server in the devel section soon. This an old package - I noted the first commit to it had revision number 24 and was by Robert Gentleman on 2001-08-01. Best wishes Wolfgang El Mar 13, 2013, a las 10:41 am, James F. Reid <reidjf at gmail.com> escribi?:
Hi Wolfgang, On 12/03/13 21:11, Wolfgang Huber wrote:
Dear James Thank you. What would the saved time be (e.g. compared to the overall runtime of arrayQualityMetrics)? I would be surprised if the saving was worth the added complexity, but am always happy to be surprised.
I believe so but maybe I'm missing something.
Here are the system times on a 20000*100 on my laptop using just dist2, which would also have a consequence on the overall runtime of arrayQualityMetrics in aqm.heamap.
dist2 <- function (x,
fun = function(a, b) mean(abs(a - b), na.rm = TRUE),
diagonal = 0) {
defaultFun <- ifelse(missing(fun), TRUE, FALSE)
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
if (defaultFun) {
res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
} else {
for (j in 2:ncol(x))
for (i in 1:(j - 1))
res[i, j] = res[j, i] = fun(x[, i], x[, j])
}
}
return(res)
}
y <- matrix(rnorm(20000 * 100), 20000, 100)
system.time(dist2(x = y, fun = function(a, b) mean(abs(a - b), na.rm=TRUE)))
## user system elapsed
## 11.664 0.060 11.800
system.time(dist2(x = y))
## user system elapsed
## 5.201 0.348 5.600
Not sure what you'd want to change to the Rd.
Best,
James.
A patch of the .R and .Rd file would be most welcome and expedite the change. Btw, colSums apparently also works with 3-dim arrays, so both loops (over i and j) could be vectorised, however afaIcs at the cost of constructing an object of size nrow(x)^3 in memory, which might again break performance. Best wishes Wolfgang Il giorno Mar 12, 2013, alle ore 4:43 PM, James F. Reid <reidjf at gmail.com> ha scritto:
Dear bioc-devel,
the dist2 function in genefilter defined as:
dist2 <- function (x, fun = function(a, b) mean(abs(a - b), na.rm = TRUE), diagonal = 0) {
if (!(is.numeric(diagonal) && (length(diagonal) == 1L)))
stop("'diagonal' must be a numeric scalar.")
res = matrix(diagonal, ncol = ncol(x), nrow = ncol(x))
colnames(res) = rownames(res) = colnames(x)
if (ncol(x) >= 2) {
for (j in 2:ncol(x)) for (i in 1:(j - 1)) res[i, j] = res[j,
i] = fun(x[, i], x[, j])
}
return(res)
}
could have it's default function vectorized as:
res <- apply(x, 2, function(i) colMeans(abs(x - i), na.rm=TRUE))
to improve performance for example in the ArrayQualityMetrics package.
Best.
James.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel