Hi,
There appears to be a problem either with the function "combine",
or with my understanding of what it is doing:
## ----
library(Biobase)
dfA <- data.frame(label=rep("x", 2), row.names=1:2)
dfB <- data.frame(label=rep("x", 3), row.names=3:5)
dfC <- data.frame(label=rep("x", 4), row.names=6:9)
dfAB <- combine(dfA, dfB) # ok
dfABC <- combine(dfAB, dfC) # ?! error
Error in combine(dfAB, dfC) : data.frames contain conflicting data:
non-conforming colname(s): label
## ---
Laurent
[Bioc-devel] bug-report for combine
6 messages · Laurent Gautier, Seth Falcon
It seems to be a problem with 'identical'.
The following patched "combine" seems to fix the problem
(workaround for zero-rows data.frames:
length(sharedRows) == 0))
)
## ---
setMethod("combine",
signature=list(x="data.frame", y="data.frame"),
function(x, y, ...) {
if (all(dim(x) == 0) && all(dim(y) == 0))
return(x)
else if (all(dim(x) == 0))
return(y)
else if (all(dim(y) == 0))
return(x)
uniqueRows <- unique(c(row.names(x), row.names(y)))
uniqueCols <- unique(c(names(x), names(y)))
sharedCols <- intersect(names(x), names(y))
sharedRows <- intersect(row.names(x), row.names(y))
ok <- sapply(sharedCols, function(nm) {
if (class(x[[nm]]) != class(y[[nm]]))
return(FALSE)
switch(class(x[[nm]])[[1]], factor = {
if (identical(levels(x[[nm]]), levels(y[[nm]])) &&
(identical(x[sharedRows, nm, drop = FALSE],
y[sharedRows, nm, drop = FALSE]) ||
length(sharedRows) == 0))
TRUE
else FALSE
}, ordered = , identical(x[sharedRows, nm, drop = FALSE],
y[sharedRows, nm, drop = FALSE]))
})
if (!all(ok))
stop("data.frames contain conflicting data:",
"\n\tnon-conforming colname(s): ", paste(sharedCols[!ok],
collapse = ", "))
if (length(uniqueRows) == 0) {
x <- x["tmp", , drop = FALSE]
y <- y["tmp", , drop = FALSE]
}
else if (nrow(x) == 0) {
x <- x[row.names(y), , drop = FALSE]
row.names(x) <- row.names(y)
}
else if (nrow(y) == 0) {
y <- y[row.names(x), , drop = FALSE]
row.names(y) <- row.names(x)
}
if (length(uniqueCols) > 0)
extLength <- max(nchar(sub(".*\\.", "", uniqueCols))) +
1
else extLength <- 1
extX <- paste(c(".", rep("x", extLength)), collapse = "")
extY <- paste(c(".", rep("y", extLength)), collapse = "")
z <- merge(x, y, by = "row.names", all = TRUE, suffixes = c(extX,
extY))
for (nm in sharedCols) {
nmx <- paste(nm, extX, sep = "")
nmy <- paste(nm, extY, sep = "")
z[[nm]] <- switch(class(z[[nmx]]), AsIs =
I(ifelse(is.na(z[[nmx]]),
z[[nmy]], z[[nmx]])), factor = {
col <- ifelse(is.na(z[[nmx]]), as.character(z[[nmy]]),
as.character(z[[nmx]]))
if (!identical(levels(z[[nmx]]), levels(z[[nmy]])))
factor(col)
else factor(col, levels = levels(z[[nmx]]))
}, ifelse(is.na(z[[nmx]]), z[[nmy]], z[[nmx]]))
}
row.names(z) <- z$Row.names
z$Row.names <- NULL
z[uniqueRows, uniqueCols, drop = FALSE]
}
)
2007/7/18, Laurent Gautier <lgautier at gmail.com>:
Hi,
There appears to be a problem either with the function "combine",
or with my understanding of what it is doing:
## ----
library(Biobase)
dfA <- data.frame(label=rep("x", 2), row.names=1:2)
dfB <- data.frame(label=rep("x", 3), row.names=3:5)
dfC <- data.frame(label=rep("x", 4), row.names=6:9)
dfAB <- combine(dfA, dfB) # ok
dfABC <- combine(dfAB, dfC) # ?! error
Error in combine(dfAB, dfC) : data.frames contain conflicting data:
non-conforming colname(s): label
## ---
Laurent
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iEYEARECAAYFAkYgwJ4ACgkQB/w/MLoyRDeQlgCeMp8v69/Wy24Q4IaBVhoG1M5R 2h4AoIOTvKbrFpTklRDjV7u8tEOeSQqt =JPph -----END PGP SIGNATURE-----
Hi Laurent, "Laurent Gautier" <lgautier at gmail.com> writes:
It seems to be a problem with 'identical'. The following patched "combine" seems to fix the problem (workaround for zero-rows data.frames: length(sharedRows) == 0)) )
Thanks for the report and the fix. Would you be willing to resend as a real patch? By that I mean: - make the change in an up to date svn working copy of Biobase - send output of 'svn diff' We should also add your test case to the unit tests in inst/UnitTests + seth
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
2007/7/18, Seth Falcon <sfalcon at fhcrc.org>:
Hi Laurent, "Laurent Gautier" <lgautier at gmail.com> writes:
It seems to be a problem with 'identical'. The following patched "combine" seems to fix the problem (workaround for zero-rows data.frames: length(sharedRows) == 0)) )
Thanks for the report and the fix. Would you be willing to resend as a real patch? By that I mean: - make the change in an up to date svn working copy of Biobase - send output of 'svn diff'
I would be willing (...'just have to find the time to do it ;-) ) (and I also fixed a probable other bug since my last post)
We should also add your test case to the unit tests in inst/UnitTests
I was about to suggest it.
+ seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
"Laurent Gautier" <lgautier at gmail.com> writes:
2007/7/18, Seth Falcon <sfalcon at fhcrc.org>:
Hi Laurent, "Laurent Gautier" <lgautier at gmail.com> writes:
It seems to be a problem with 'identical'. The following patched "combine" seems to fix the problem (workaround for zero-rows data.frames: length(sharedRows) == 0)) )
Thanks for the report and the fix. Would you be willing to resend as a real patch? By that I mean: - make the change in an up to date svn working copy of Biobase - send output of 'svn diff'
I would be willing (...'just have to find the time to do it ;-) ) (and I also fixed a probable other bug since my last post)
You can run svn diff on just a single file if that helps. You can, with care, edit hunks out of a patch file to isolate a change. I know how these things go, but perhaps you can appreciate that it is a bit funny to have time to send an email along with a fix, but not have time to send a patch and expect others to generate it by hand... or maybe you aren't asking and are just saying that you need a spot of time which is quite understandable... Moving off topic, I've been using git [1] as my local version control system and it really helps with this sort of problem because it allows you to commit changes locally and then revise those commits prior to sending them on to svn using git-svn, of course. This goes a long way towards preventing patches with random unrelated fixes sprinkled throughout. And it makes generating patches for picky picky people on mailing lists much easier ;-)
We should also add your test case to the unit tests in inst/UnitTests
I was about to suggest it.
Should be an easy patch :-) + seth [1] http://git.or.cz/
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
1 day later
Hi, Laurent, thanks for the patch you sent to me offlist. I've applied it and added a unit test based on the example you sent that triggered the error. So this issue should be resolved in Biobase 1.15.21. + seth
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org