Hi there,
I am about to develop a Bioconductor package that implements a custom S4
object, and I am currently thinking about a few issues, including the
following:
Say we have an S4 object that stores a lot of information in different
slots. Assume that it does make sense to extract information out of this
object in four different "dimensions" (conceptually similar to a
four-dimensional object), so one would like to use the subset "["
operator for this, but extending beyond the "typical" one or two
dimensions to 4:
setClass("A",
representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
Now it would be nice to do stuff like a[1,2,3:4,5], which should simply
return the selected elements in slots a, b, c, and d, respectively. So
a[1,2,3:4,5] would return:
An object of class "A"
Slot "a":
[1] 1
Slot "b":
[1] 2
Slot "c":
[1] 3 4
Slot "d":
[1] 5
This is how far I've come:
setMethod("[", c("A", "ANY", "ANY","ANY"),
function(x, i, j, ..., drop=TRUE)
{
dots <- list(...)
if (length(dots) > 2) {
stop("Too many arguments, must be four dimensional")
}
# Parse the extra two dimensions that we need from the ...
argument
k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
})
This works for stuff like a[1,2,3, 4], but fails with a general error if
one of the indices is a vector such as a[1:2,2,3, 4] or a[1,2,3,4:5].
So, in summary, my questions are:
1. Is there a reasonable way of achieving the 4-dimensional subsetting
that works as a user would expect it to work?
2. Does it make more sense to write a custom function instead to achieve
this, such as subsetObject() without overloading "[" explicitly? What
are the Bioconductor recommendations here?
I'd appreciate any help, suggestions, etc!
Thanks,
Christian
[Bioc-devel] Overloading subset operator for an S4 object with more than two dimensions
5 messages · Wolfgang Huber, Michael Lawrence, Christian Arnold +1 more
Dear Christian not sure this is a wise idea, it breaks the semantics of ?[?. The number of elements stored in an array is the product of the extent of its dimensions. In your example, it is the sum. To put it less abstract, a[1:2, 2, 3:4, 1] for a regular array is a 2 x 2 matrix, whereas in your construct is something with 2 + 1 + 2 + 1 = 6 numbers in it. As you say, it looks like you want something like the semantics of ?subset? (base package) or `filter` (dplyr), and then using such method names would be more intuitive. Wolfgang
On May 14, 2015, at 12:35 GMT+2, Christian Arnold <christian.arnold at embl.de> wrote:
Hi there,
I am about to develop a Bioconductor package that implements a custom S4 object, and I am currently thinking about a few issues, including the following:
Say we have an S4 object that stores a lot of information in different slots. Assume that it does make sense to extract information out of this object in four different "dimensions" (conceptually similar to a four-dimensional object), so one would like to use the subset "[" operator for this, but extending beyond the "typical" one or two dimensions to 4:
setClass("A", representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
Now it would be nice to do stuff like a[1,2,3:4,5], which should simply return the selected elements in slots a, b, c, and d, respectively. So a[1,2,3:4,5] would return:
An object of class "A"
Slot "a":
[1] 1
Slot "b":
[1] 2
Slot "c":
[1] 3 4
Slot "d":
[1] 5
This is how far I've come:
setMethod("[", c("A", "ANY", "ANY","ANY"),
function(x, i, j, ..., drop=TRUE)
{
dots <- list(...)
if (length(dots) > 2) {
stop("Too many arguments, must be four dimensional")
}
# Parse the extra two dimensions that we need from the ... argument
k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
})
This works for stuff like a[1,2,3, 4], but fails with a general error if one of the indices is a vector such as a[1:2,2,3, 4] or a[1,2,3,4:5].
So, in summary, my questions are:
1. Is there a reasonable way of achieving the 4-dimensional subsetting that works as a user would expect it to work?
2. Does it make more sense to write a custom function instead to achieve this, such as subsetObject() without overloading "[" explicitly? What are the Bioconductor recommendations here?
I'd appreciate any help, suggestions, etc!
Thanks,
Christian
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
I agree with Wolfgang that the semantics of [ are being violated here. It would though help if you could be a little less vague about your intent. What is this data structure going to store, how should it behave? On Thu, May 14, 2015 at 3:35 AM, Christian Arnold <christian.arnold at embl.de> wrote:
Hi there,
I am about to develop a Bioconductor package that implements a custom S4
object, and I am currently thinking about a few issues, including the
following:
Say we have an S4 object that stores a lot of information in different
slots. Assume that it does make sense to extract information out of this
object in four different "dimensions" (conceptually similar to a
four-dimensional object), so one would like to use the subset "[" operator
for this, but extending beyond the "typical" one or two dimensions to 4:
setClass("A",
representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
Now it would be nice to do stuff like a[1,2,3:4,5], which should simply
return the selected elements in slots a, b, c, and d, respectively. So
a[1,2,3:4,5] would return:
An object of class "A"
Slot "a":
[1] 1
Slot "b":
[1] 2
Slot "c":
[1] 3 4
Slot "d":
[1] 5
This is how far I've come:
setMethod("[", c("A", "ANY", "ANY","ANY"),
function(x, i, j, ..., drop=TRUE)
{
dots <- list(...)
if (length(dots) > 2) {
stop("Too many arguments, must be four dimensional")
}
# Parse the extra two dimensions that we need from the ...
argument
k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
})
This works for stuff like a[1,2,3, 4], but fails with a general error if
one of the indices is a vector such as a[1:2,2,3, 4] or a[1,2,3,4:5].
So, in summary, my questions are:
1. Is there a reasonable way of achieving the 4-dimensional subsetting
that works as a user would expect it to work?
2. Does it make more sense to write a custom function instead to achieve
this, such as subsetObject() without overloading "[" explicitly? What are
the Bioconductor recommendations here?
I'd appreciate any help, suggestions, etc!
Thanks,
Christian
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
3 days later
Thanks for your input, highly appreciated! I can see that the semantics of "[" are violated, so I agree that overwriting the "subset" method is probably a better way to go. Essentially, the object stores several, individual-specific count matrices from RNA-Seq experiments in an potentially allele(read group)-specific manner. So the dimensions to subset on are the read groups, the rows and columns of the matrices, and the individuals itself. So I guess overloading the subset method with four arguments, each corresponding to one of the dimensions a subset is suitable for this kind of object, is the way to go. Thanks, Christian
On 14.05.2015 15:57, Michael Lawrence wrote:
I agree with Wolfgang that the semantics of [ are being violated here.
It would though help if you could be a little less vague about your
intent. What is this data structure going to store, how should it behave?
On Thu, May 14, 2015 at 3:35 AM, Christian Arnold
<christian.arnold at embl.de <mailto:christian.arnold at embl.de>> wrote:
Hi there,
I am about to develop a Bioconductor package that implements a
custom S4 object, and I am currently thinking about a few issues,
including the following:
Say we have an S4 object that stores a lot of information in
different slots. Assume that it does make sense to extract
information out of this object in four different "dimensions"
(conceptually similar to a four-dimensional object), so one would
like to use the subset "[" operator for this, but extending beyond
the "typical" one or two dimensions to 4:
setClass("A",
representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
Now it would be nice to do stuff like a[1,2,3:4,5], which should
simply return the selected elements in slots a, b, c, and d,
respectively. So a[1,2,3:4,5] would return:
An object of class "A"
Slot "a":
[1] 1
Slot "b":
[1] 2
Slot "c":
[1] 3 4
Slot "d":
[1] 5
This is how far I've come:
setMethod("[", c("A", "ANY", "ANY","ANY"),
function(x, i, j, ..., drop=TRUE)
{
dots <- list(...)
if (length(dots) > 2) {
stop("Too many arguments, must be four dimensional")
}
# Parse the extra two dimensions that we need from the
... argument
k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
})
This works for stuff like a[1,2,3, 4], but fails with a general
error if one of the indices is a vector such as a[1:2,2,3, 4] or
a[1,2,3,4:5].
So, in summary, my questions are:
1. Is there a reasonable way of achieving the 4-dimensional
subsetting that works as a user would expect it to work?
2. Does it make more sense to write a custom function instead to
achieve this, such as subsetObject() without overloading "["
explicitly? What are the Bioconductor recommendations here?
I'd appreciate any help, suggestions, etc!
Thanks,
Christian
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
????????????????????????? Christian Arnold, PhD Staff Bioinformatician SCB Unit - Computational Biology Joint appointment Genome Biology Joint appointment European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Meyerhofstrasse 1; 69117, Heidelberg, Germany Email: christian.arnold at embl.de Phone: +49(0)6221-387-8472 Web: http://www.embl.de/research/units/scb/zaugg/ [[alternative HTML version deleted]]
On 05/18/2015 06:06 AM, Christian Arnold wrote:
Thanks for your input, highly appreciated! I can see that the semantics of "[" are violated, so I agree that overwriting the "subset" method is probably a better way to go. Essentially, the object stores several, individual-specific count matrices from RNA-Seq experiments in an potentially allele(read group)-specific manner. So the dimensions to subset on are the read
Maybe this is a SummarizedExperiment with different assays() ? This would be
appropriate if each assay had the same regions-of-interest (GRanges or
GRangesList) x Sample dimensions, so may not be relevant to you.
In Bioc 'devel'
library(SummarizedExperiment)
## allele-specific counts, two alleles
m1 = matrix(rbinom(1000, 100, .1), 100, dimnames=list(NULL, LETTERS[1:10]))
m2 = matrix(rbinom(1000, 100, .1), 100, dimnames=list(NULL, LETTERS[1:10]))
se = SummarizedExperiment(assays=list(a1=m1, a2=m2))
se[1:5,] # regions 1-5, across assays
assays(se[,c("A", "B")])[["a2"]] # assay a2 for samples "A", "B"
groups, the rows and columns of the matrices, and the individuals itself. So I guess overloading the subset method with four arguments, each corresponding to one of the dimensions a subset is suitable for this kind of object, is the way to go. Thanks, Christian On 14.05.2015 15:57, Michael Lawrence wrote:
I agree with Wolfgang that the semantics of [ are being violated here.
It would though help if you could be a little less vague about your
intent. What is this data structure going to store, how should it behave?
On Thu, May 14, 2015 at 3:35 AM, Christian Arnold
<christian.arnold at embl.de <mailto:christian.arnold at embl.de>> wrote:
Hi there,
I am about to develop a Bioconductor package that implements a
custom S4 object, and I am currently thinking about a few issues,
including the following:
Say we have an S4 object that stores a lot of information in
different slots. Assume that it does make sense to extract
information out of this object in four different "dimensions"
(conceptually similar to a four-dimensional object), so one would
like to use the subset "[" operator for this, but extending beyond
the "typical" one or two dimensions to 4:
setClass("A",
representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
Now it would be nice to do stuff like a[1,2,3:4,5], which should
simply return the selected elements in slots a, b, c, and d,
respectively. So a[1,2,3:4,5] would return:
An object of class "A"
Slot "a":
[1] 1
Slot "b":
[1] 2
Slot "c":
[1] 3 4
Slot "d":
[1] 5
This is how far I've come:
setMethod("[", c("A", "ANY", "ANY","ANY"),
function(x, i, j, ..., drop=TRUE)
{
dots <- list(...)
if (length(dots) > 2) {
stop("Too many arguments, must be four dimensional")
}
# Parse the extra two dimensions that we need from the
... argument
k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
})
This works for stuff like a[1,2,3, 4], but fails with a general
error if one of the indices is a vector such as a[1:2,2,3, 4] or
a[1,2,3,4:5].
So, in summary, my questions are:
1. Is there a reasonable way of achieving the 4-dimensional
subsetting that works as a user would expect it to work?
2. Does it make more sense to write a custom function instead to
achieve this, such as subsetObject() without overloading "["
explicitly? What are the Bioconductor recommendations here?
I'd appreciate any help, suggestions, etc!
Thanks,
Christian
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793