[Bioc-devel] assay dimnames in SingleCellExperiment / SummarizedExperiment - Bioc-devel

Thu, Sep 14, 2017 5:57 AM #

Dear all,

I cc-ed to this email individual package maintainer to directly 'notify'
them of this thread and have their respective opinions, but I thought the
common use of SummarizedExperiment was worth involving the community as
well.

Background: I was updating one of my workflow from SCESet to the
SingleCellExperiment class recently introduced on the development branch.

1)
One thing leading to another, I ended up noticing that there is no validity
check on dimnames of the various assays in SummarizedExperiment. In other
words, the different assays can have different `dimnames` (or some assays
can have NULL dimnames). Using the example code from SummarizedExperiment:

nrows <- 200; ncols <- 6
counts3 <- counts2 <- counts <-
  matrix(runif(nrows * ncols, 1, 1e4), nrows)

rnames <- paste0("F_", sprintf("%03.f", seq_len(nrows)))
cnames <- LETTERS[1:6]

dimnames(counts) <- list(rnames, cnames)
dimnames(counts2) <- list(Tags = rnames, Samples = cnames)
dimnames(counts3) <- list(Features = rnames, Cells = cnames)

colData <- DataFrame(row.names=cnames)

rse <- SummarizedExperiment(assays=SimpleList(c1=counts, c2=counts2,
c3=counts3), colData=colData)

assayNames(rse)
names(dimnames(assay(rse, "c1"))) # NULL
names(dimnames(assay(rse, "c2"))) # [1] "Tags"    "Samples"
names(dimnames(assay(rse, "c3"))) # [1] "Features" "Cells"

Although not critical, it'd probably be best practice to have a validity
check on identical dimnames across all assay, so that one does not have to
worry later about `melt` calls returning different column names whether
each assay has proper dimnames or not.


2)
The initial glitch that prompted this email related to the `reshape2::melt`
method that extracts dimnames, if available, in the
`scater::plotHighestExprs` function. Anyway, Davis has already prepared a
fix to deal with the scenario whereby the assay does have dimnames (e.g.
counts in the edgeR::DGEList class that I generally use to import counts).
Somehow that wasn't an issue with the SCESet that I was using previously
(probably a side-effect of ExpressionSet).

The point is, the glitch prompted me to think whether a potential
standardisation of names(dimnames) could be beneficial, perhaps more
specifically in the new `SingleCellExperiment` class (as
SummarizedExperiment has a much more general purpose). Considering the
fairly specific purpose of the former, I was wondering whether it would be
worth:

   - enforcing names(dimnames(x)) to "Features" and "Cells", (bearing in
   mind that features could still be genes, transcripts, ...)
   - or maybe dropping dimnames altogether, storing them only once
   elsewhere (although a slot for that seems overkill)

There may be other possibilities that I haven't thought of yet, but I
thought I'd get the ball rolling.
Having well-defined dimnames sounds good practice, with the added benefit
of generating aesthetically pleasing column names in melted data-frame as a
by-product.
However, I can't tell whether the handling of dimnames is something that
needs to be handle by individual downstream package developers, or whether
standards should be set in parent classes.


Thanks for your time!

Best,
Kevin

Aaron Lun

Fri, Sep 15, 2017 10:43 PM #

I'll leave the first point to the SummarizedExperiment maintainers, though I  note that your code seems to be about the names of the dimnames rather than the dimnames themselves. (I'm under the impression that consistency in the actual dimnames is enforced somehow by the SE constructor.)


As for the second point; I suppose we could set the second name for the dimnames as "Cells" in SingleCellExperiment, though the choice for the first name is more ambiguous. This request has come up before, and I've never been entirely convinced by its necessity. It seems mostly aesthetic to me, and honestly, if a user doesn't already know that rows are genes and columns are cells, I can't see them flailing away at the keyboard until they call dim() to tell them what the dimensions correspond to.


But I guess other people like aesthetics, so if you want, you can put in a PR to override dim() and dimnames() for SingleCellExperiment to put some names on the returned vectors or lists. If I had to choose, I would go with "Features" and "Cells" for the rows and columns, respectively. (We already use a RSE so we're already implicitly assuming genomic features.)


-Aaron

From: Kevin RUE <kevinrue67 at gmail.com>
Sent: Thursday, 14 September 2017 10:57:39 PM
To: bioc-devel
Cc: davis at ebi.ac.uk; risso.davide at gmail.com; Aaron Lun; Maintainer
Subject: assay dimnames in SingleCellExperiment / SummarizedExperiment

Dear all,

I cc-ed to this email individual package maintainer to directly 'notify' them of this thread and have their respective opinions, but I thought the common use of SummarizedExperiment was worth involving the community as well.

Background: I was updating one of my workflow from SCESet to the SingleCellExperiment class recently introduced on the development branch.

1)
One thing leading to another, I ended up noticing that there is no validity check on dimnames of the various assays in SummarizedExperiment. In other words, the different assays can have different `dimnames` (or some assays can have NULL dimnames). Using the example code from SummarizedExperiment:

nrows <- 200; ncols <- 6
counts3 <- counts2 <- counts <-
  matrix(runif(nrows * ncols, 1, 1e4), nrows)

rnames <- paste0("F_", sprintf("%03.f", seq_len(nrows)))
cnames <- LETTERS[1:6]

dimnames(counts) <- list(rnames, cnames)
dimnames(counts2) <- list(Tags = rnames, Samples = cnames)
dimnames(counts3) <- list(Features = rnames, Cells = cnames)

colData <- DataFrame(row.names=cnames)

rse <- SummarizedExperiment(assays=SimpleList(c1=counts, c2=counts2, c3=counts3), colData=colData)

assayNames(rse)
names(dimnames(assay(rse, "c1"))) # NULL
names(dimnames(assay(rse, "c2"))) # [1] "Tags"    "Samples"
names(dimnames(assay(rse, "c3"))) # [1] "Features" "Cells"

Although not critical, it'd probably be best practice to have a validity check on identical dimnames across all assay, so that one does not have to worry later about `melt` calls returning different column names whether each assay has proper dimnames or not.


2)
The initial glitch that prompted this email related to the `reshape2::melt` method that extracts dimnames, if available, in the `scater::plotHighestExprs` function. Anyway, Davis has already prepared a fix to deal with the scenario whereby the assay does have dimnames (e.g. counts in the edgeR::DGEList class that I generally use to import counts). Somehow that wasn't an issue with the SCESet that I was using previously (probably a side-effect of ExpressionSet).

The point is, the glitch prompted me to think whether a potential standardisation of names(dimnames) could be beneficial, perhaps more specifically in the new `SingleCellExperiment` class (as SummarizedExperiment has a much more general purpose). Considering the fairly specific purpose of the former, I was wondering whether it would be worth:

  *   enforcing names(dimnames(x)) to "Features" and "Cells", (bearing in mind that features could still be genes, transcripts, ...)
  *   or maybe dropping dimnames altogether, storing them only once elsewhere (although a slot for that seems overkill)

There may be other possibilities that I haven't thought of yet, but I thought I'd get the ball rolling.
Having well-defined dimnames sounds good practice, with the added benefit of generating aesthetically pleasing column names in melted data-frame as a by-product.
However, I can't tell whether the handling of dimnames is something that needs to be handle by individual downstream package developers, or whether standards should be set in parent classes.


Thanks for your time!

Best,
Kevin

Kevin RUE

Sat, Sep 16, 2017 3:49 AM #

Hi Aaron,

Yes - sorry, I meant the names of dimnames. Dimnames are indeed checked,
but my code was meant to demonstrate that names of dimnames aren't.
Obviously, it's not the end of the world, but just something I noticed
while I was investigating the glitch.

My second point is not that much about calling dim or dimnames, but rather
about the side-effects of having names(dimnames(x)) not NULL, such as the
case of `reshape2::melt`.
I think it'd be one worry less for downstream methods to 'know' the
colnames of a melted assay(x, 1) instead of having "Var1, Var2, value" if
names(dimnames) is NULL, and "something else" if not NULL.

Beyond aesthetics, it's really just semantics, but I do think small stuff
like that, if handled at a higher class level, can encourage downstream
developers to work off a more consistent mental and computational model (my
take from Michael Lawrence's BOF at Bioc2017). In other words, it has a
small cost to implement in the parent class, instead of if-else statements
in each child class.

It could be something as simple as :

   - c("Feature", "Sample") at the `SummarizedExperiment` level
   - overriden by c("Feature", "Cell") in `SingleCellExperiment`
   - overriden by developer's choice in other dependent packages.


All the best,
Kevin

On Sat, Sep 16, 2017 at 6:43 AM, Aaron Lun <alun at wehi.edu.au> wrote:

I'll leave the first point to the SummarizedExperiment maintainers, though
I  note that your code seems to be about the names of the dimnames rather
than the dimnames themselves. (I'm under the impression that consistency in
the actual dimnames is enforced somehow by the SE constructor.)


As for the second point; I suppose we *could* set the second name for the
dimnames as "Cells" in SingleCellExperiment, though the choice for the
first name is more ambiguous. This request has come up before, and I've
never been entirely convinced by its necessity. It seems mostly aesthetic
to me, and honestly, if a user doesn't already know that rows are genes and
columns are cells, I can't see them flailing away at the keyboard until
they call dim() to tell them what the dimensions correspond to.


But I guess other people like aesthetics, so if you want, you can put in a
PR to override dim() and dimnames() for SingleCellExperiment to put some
names on the returned vectors or lists. If I had to choose, I would go with
"Features" and "Cells" for the rows and columns, respectively. (We already
use a RSE so we're already implicitly assuming genomic features.)


-Aaron
------------------------------
*From:* Kevin RUE <kevinrue67 at gmail.com>
*Sent:* Thursday, 14 September 2017 10:57:39 PM
*To:* bioc-devel
*Cc:* davis at ebi.ac.uk; risso.davide at gmail.com; Aaron Lun; Maintainer
*Subject:* assay dimnames in SingleCellExperiment / SummarizedExperiment

Dear all,

I cc-ed to this email individual package maintainer to directly 'notify'
them of this thread and have their respective opinions, but I thought the
common use of SummarizedExperiment was worth involving the community as
well.

Background: I was updating one of my workflow from SCESet to the
SingleCellExperiment class recently introduced on the development branch.

1)
One thing leading to another, I ended up noticing that there is no
validity check on dimnames of the various assays in SummarizedExperiment.
In other words, the different assays can have different `dimnames` (or some
assays can have NULL dimnames). Using the example code from
SummarizedExperiment:

nrows <- 200; ncols <- 6
counts3 <- counts2 <- counts <-
  matrix(runif(nrows * ncols, 1, 1e4), nrows)

rnames <- paste0("F_", sprintf("%03.f", seq_len(nrows)))
cnames <- LETTERS[1:6]

dimnames(counts) <- list(rnames, cnames)
dimnames(counts2) <- list(Tags = rnames, Samples = cnames)
dimnames(counts3) <- list(Features = rnames, Cells = cnames)

colData <- DataFrame(row.names=cnames)

rse <- SummarizedExperiment(assays=SimpleList(c1=counts, c2=counts2,
c3=counts3), colData=colData)

assayNames(rse)
names(dimnames(assay(rse, "c1"))) # NULL
names(dimnames(assay(rse, "c2"))) # [1] "Tags"    "Samples"
names(dimnames(assay(rse, "c3"))) # [1] "Features" "Cells"

Although not critical, it'd probably be best practice to have a validity
check on identical dimnames across all assay, so that one does not have to
worry later about `melt` calls returning different column names whether
each assay has proper dimnames or not.


2)
The initial glitch that prompted this email related to the
`reshape2::melt` method that extracts dimnames, if available, in the
`scater::plotHighestExprs` function. Anyway, Davis has already prepared a
fix to deal with the scenario whereby the assay does have dimnames (e.g.
counts in the edgeR::DGEList class that I generally use to import counts).
Somehow that wasn't an issue with the SCESet that I was using previously
(probably a side-effect of ExpressionSet).

The point is, the glitch prompted me to think whether a potential
standardisation of names(dimnames) could be beneficial, perhaps more
specifically in the new `SingleCellExperiment` class (as
SummarizedExperiment has a much more general purpose). Considering the
fairly specific purpose of the former, I was wondering whether it would be
worth:

   - enforcing names(dimnames(x)) to "Features" and "Cells", (bearing in
   mind that features could still be genes, transcripts, ...)
   - or maybe dropping dimnames altogether, storing them only once
   elsewhere (although a slot for that seems overkill)

There may be other possibilities that I haven't thought of yet, but I
thought I'd get the ball rolling.
Having well-defined dimnames sounds good practice, with the added benefit
of generating aesthetically pleasing column names in melted data-frame as a
by-product.
However, I can't tell whether the handling of dimnames is something that
needs to be handle by individual downstream package developers, or whether
standards should be set in parent classes.


Thanks for your time!

Best,
Kevin

Hervé Pagès

Fri, Sep 22, 2017 3:47 PM #

Hi guys,

On 09/16/2017 03:49 AM, Kevin RUE wrote:

Sounds like a good idea to me to check for consistent names of
dimnames across assays. I'll add this to the validity method of 
SummarizedExperiment objects.

I'm not too keen on enforcing this at the SummarizedExperiment level.
The rows of a SummarizedExperiment object sometimes correspond to
bins or to a running window. You could even imagine use cases where
they correspond to reads or groups of reads or protein IDs. As general
as "Feature" might sound, it would feel a little bit like a misnomer
for these use cases. This could slightly hurt the re-usability appeal
of SummarizedExperiment objects.

I could see the same argument being made about enforcing
names(dimnames(x))[1] to "Feature" for a SingleCellExperiment
object. However enforcing names(dimnames(x))[2] to "Cell" is
probably fine and "Cell" seems like the natural choice given
that this is hardcoded in the name of the class.

Note that technically you cannot have SummarizedExperiment
enforce c("Feature", "Sample") and SingleCellExperiment enforce
something else. That's because S4 doesn't let you override validity
criteria defined by an ancestor class. And that in turn is because
in S4 validation is *incremental*. This means that the validity
method for a subclass only needs to worry about validating what's
not already covered by the validity methods of all the ancestors
class. When one calls validObject(x), first the validity methods
for all 'x' ancestor classes are called (from the most distant
ancestor to the direct parent), and the validity method for the
class of 'x' is finally called. This means that you cannot write
a validity method for 'x' that contradicts what the validity
methods for the ancestor classes expect. In other words, if B
extends A, an object of class B must be a valid A object (remember
that is(x, "A") is TRUE) before it can be considered to be a
valid B object. In the (almost) real world this is just saying
that before a cat can be considered to be a valid red cat it must
first be considered to be a valid cat. (Don't ask me what a valid
cat is.)

Cheers,
H.


All the best,
Kevin


On Sat, Sep 16, 2017 at 6:43 AM, Aaron Lun <alun at wehi.edu.au
<mailto:alun at wehi.edu.au>> wrote:

    I'll leave the first point to the SummarizedExperiment maintainers,
    though I  note that your code seems to be about the names of the
    dimnames rather than the dimnames themselves. (I'm under the
    impression that consistency in the actual dimnames is enforced
    somehow by the SE constructor.)


    As for the second point; I suppose we /could/ set the second name
    for the dimnames as "Cells" in SingleCellExperiment, though the
    choice for the first name is more ambiguous. This request has come
    up before, and I've never been entirely convincedby its necessity.
    It seems mostly aesthetic to me, and honestly, if a user doesn't
    already know that rows are genes and columns are cells, I can't see
    them flailing away at the keyboard until they call dim() to tell
    them what the dimensions correspond to.


    But I guess other people like aesthetics, so if you want, you can
    put in a PR to override dim() and dimnames() for
    SingleCellExperiment to put some names on the returned vectors or
    lists. If I had to choose, I would go with "Features" and "Cells"
    for the rows and columns, respectively. (We already use a RSE so
    we're already implicitly assuming genomic features.)


    -Aaron

    ------------------------------------------------------------------------
    *From:* Kevin RUE <kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>>
    *Sent:* Thursday, 14 September 2017 10:57:39 PM
    *To:* bioc-devel
    *Cc:* davis at ebi.ac.uk <mailto:davis at ebi.ac.uk>;
    risso.davide at gmail.com <mailto:risso.davide at gmail.com>; Aaron Lun;
    Maintainer
    *Subject:* assay dimnames in SingleCellExperiment /
    SummarizedExperiment
    Dear all,

    I cc-ed to this email individual package maintainer to directly
    'notify' them of this thread and have their respective opinions, but
    I thought the common use of SummarizedExperiment was worth involving
    the community as well.

    Background: I was updating one of my workflow from SCESet to the
    SingleCellExperiment class recently introduced on the development
    branch.

    1)
    One thing leading to another, I ended up noticing that there is no
    validity check on dimnames of the various assays
    in SummarizedExperiment. In other words, the different assays can
    have different `dimnames` (or some assays can have NULL dimnames).
    Using the example code from SummarizedExperiment:

    nrows <- 200; ncols <- 6
    counts3 <- counts2 <- counts <-
       matrix(runif(nrows * ncols, 1, 1e4), nrows)

    rnames <- paste0("F_", sprintf("%03.f", seq_len(nrows)))
    cnames <- LETTERS[1:6]

    dimnames(counts) <- list(rnames, cnames)
    dimnames(counts2) <- list(Tags = rnames, Samples = cnames)
    dimnames(counts3) <- list(Features = rnames, Cells = cnames)

    colData <- DataFrame(row.names=cnames)

    rse <- SummarizedExperiment(assays=SimpleList(c1=counts, c2=counts2,
    c3=counts3), colData=colData)

    assayNames(rse)
    names(dimnames(assay(rse, "c1"))) # NULL
    names(dimnames(assay(rse, "c2"))) # [1] "Tags"    "Samples"
    names(dimnames(assay(rse, "c3"))) # [1] "Features" "Cells"

    Although not critical, it'd probably be best practice to have a
    validity check on identical dimnames across all assay, so that one
    does not have to worry later about `melt` calls returning different
    column names whether each assay has proper dimnames or not.


    2)
    The initial glitch that prompted this email related to the
    `reshape2::melt` method that extracts dimnames, if available, in the
    `scater::plotHighestExprs` function. Anyway, Davis has already
    prepared a fix to deal with the scenario whereby the assay does have
    dimnames (e.g. counts in the edgeR::DGEList class that I generally
    use to import counts). Somehow that wasn't an issue with the SCESet
    that I was using previously (probably a side-effect of ExpressionSet).

    The point is, the glitch prompted me to think whether a potential
    standardisation of names(dimnames) could be beneficial, perhaps more
    specifically in the new `SingleCellExperiment` class (as
    SummarizedExperiment has a much more general purpose). Considering
    the fairly specific purpose of the former, I was wondering whether
    it would be worth:

      * enforcing names(dimnames(x)) to "Features" and "Cells", (bearing
        in mind that features could still be genes, transcripts, ...)
      * or maybe dropping dimnames altogether, storing them only once
        elsewhere (although a slot for that seems overkill)

    There may be other possibilities that I haven't thought of yet, but
    I thought I'd get the ball rolling.
    Having well-defined dimnames sounds good practice, with the added
    benefit of generating aesthetically pleasing column names in melted
    data-frame as a by-product.
    However, I can't tell whether the handling of dimnames is something
    that needs to be handle by individual downstream package developers,
    or whether standards should be set in parent classes.


    Thanks for your time!

    Best,
    Kevin

Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Aaron Lun

Fri, Sep 22, 2017 11:38 PM #

Well, I guess it probably wouldn't be too bad to have "Feature" for the SingleCellExperiment "names(dimnames(...))[1]". The SCE inherits from an RSE anyway so we do imply that the rows represent some kind of genomic feature. Maybe this is too presumptive, but it doesn't seem to clash with any of  the current  applications of SCE. Of course, we could always just leave the first name empty, but I'm not sure whether this would cause Kevin's use cases to do funny things.


On another note, what is the easiest way to enforce dimnames names at the SCE level? Do I need to implement new methods for assay<- or assays<-, to overwrite the names(dimnames()) of incoming matrices? That seems like a pain. If the base SE is going to check/enforce consistent names of dimnames, perhaps this could be made into a method that SCE can specialize to coerce the names to something else.


-Aaron

From: Herv? Pag?s <hpages at fredhutch.org>
Sent: Saturday, 23 September 2017 8:47:14 AM
To: Kevin RUE; Aaron Lun
Cc: bioc-devel; davis at ebi.ac.uk; risso.davide at gmail.com; Maintainer
Subject: Re: assay dimnames in SingleCellExperiment / SummarizedExperiment

Hi guys,

On 09/16/2017 03:49 AM, Kevin RUE wrote:
> Hi Aaron,
>
> Yes - sorry, I meant the names of dimnames. Dimnames are indeed checked,
> but my code was meant to demonstrate that names of dimnames aren't.
> Obviously, it's not the end of the world, but just something I noticed
> while I was investigating the glitch.

Sounds like a good idea to me to check for consistent names of
dimnames across assays. I'll add this to the validity method of
SummarizedExperiment objects.

>
> My second point is not that much about calling dim or dimnames, but
> rather about the side-effects of having names(dimnames(x)) not NULL,
> such as the case of `reshape2::melt`.
> I think it'd be one worry less for downstream methods to 'know' the
> colnames of a melted assay(x, 1) instead of having "Var1, Var2, value"
> if names(dimnames) is NULL, and "something else" if not NULL.
>
> Beyond aesthetics, it's really just semantics, but I do think small
> stuff like that, if handled at a higher class level, can encourage
> downstream developers to work off a more consistent mental
> and computational model (my take from Michael Lawrence's BOF at
> Bioc2017). In other words, it has a small cost to implement in the
> parent class, instead of if-else statements in each child class.
>
> It could be something as simple as :
>
>   * c("Feature", "Sample") at the `SummarizedExperiment` level
>   * overriden by c("Feature", "Cell") in `SingleCellExperiment`
>   * overriden by developer's choice in other dependent packages.

I'm not too keen on enforcing this at the SummarizedExperiment level.
The rows of a SummarizedExperiment object sometimes correspond to
bins or to a running window. You could even imagine use cases where
they correspond to reads or groups of reads or protein IDs. As general
as "Feature" might sound, it would feel a little bit like a misnomer
for these use cases. This could slightly hurt the re-usability appeal
of SummarizedExperiment objects.

I could see the same argument being made about enforcing
names(dimnames(x))[1] to "Feature" for a SingleCellExperiment
object. However enforcing names(dimnames(x))[2] to "Cell" is
probably fine and "Cell" seems like the natural choice given
that this is hardcoded in the name of the class.

Note that technically you cannot have SummarizedExperiment
enforce c("Feature", "Sample") and SingleCellExperiment enforce
something else. That's because S4 doesn't let you override validity
criteria defined by an ancestor class. And that in turn is because
in S4 validation is *incremental*. This means that the validity
method for a subclass only needs to worry about validating what's
not already covered by the validity methods of all the ancestors
class. When one calls validObject(x), first the validity methods
for all 'x' ancestor classes are called (from the most distant
ancestor to the direct parent), and the validity method for the
class of 'x' is finally called. This means that you cannot write
a validity method for 'x' that contradicts what the validity
methods for the ancestor classes expect. In other words, if B
extends A, an object of class B must be a valid A object (remember
that is(x, "A") is TRUE) before it can be considered to be a
valid B object. In the (almost) real world this is just saying
that before a cat can be considered to be a valid red cat it must
first be considered to be a valid cat. (Don't ask me what a valid
cat is.)

Cheers,
H.

>
>
> All the best,
> Kevin
>
>
> On Sat, Sep 16, 2017 at 6:43 AM, Aaron Lun <alun at wehi.edu.au
> <mailto:alun at wehi.edu.au>> wrote:
>
>     I'll leave the first point to the SummarizedExperiment maintainers,
>     though I  note that your code seems to be about the names of the
>     dimnames rather than the dimnames themselves. (I'm under the
>     impression that consistency in the actual dimnames is enforced
>     somehow by the SE constructor.)
>
>
>     As for the second point; I suppose we /could/ set the second name
>     for the dimnames as "Cells" in SingleCellExperiment, though the
>     choice for the first name is more ambiguous. This request has come
>     up before, and I've never been entirely convincedby its necessity.
>     It seems mostly aesthetic to me, and honestly, if a user doesn't
>     already know that rows are genes and columns are cells, I can't see
>     them flailing away at the keyboard until they call dim() to tell
>     them what the dimensions correspond to.
>
>
>     But I guess other people like aesthetics, so if you want, you can
>     put in a PR to override dim() and dimnames() for
>     SingleCellExperiment to put some names on the returned vectors or
>     lists. If I had to choose, I would go with "Features" and "Cells"
>     for the rows and columns, respectively. (We already use a RSE so
>     we're already implicitly assuming genomic features.)
>
>
>     -Aaron
>
>     ------------------------------------------------------------------------
>     *From:* Kevin RUE <kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>>
>     *Sent:* Thursday, 14 September 2017 10:57:39 PM
>     *To:* bioc-devel
>     *Cc:* davis at ebi.ac.uk <mailto:davis at ebi.ac.uk>;
>     risso.davide at gmail.com <mailto:risso.davide at gmail.com>; Aaron Lun;
>     Maintainer
>     *Subject:* assay dimnames in SingleCellExperiment /
>     SummarizedExperiment
>     Dear all,
>
>     I cc-ed to this email individual package maintainer to directly
>     'notify' them of this thread and have their respective opinions, but
>     I thought the common use of SummarizedExperiment was worth involving
>     the community as well.
>
>     Background: I was updating one of my workflow from SCESet to the
>     SingleCellExperiment class recently introduced on the development
>     branch.
>
>     1)
>     One thing leading to another, I ended up noticing that there is no
>     validity check on dimnames of the various assays
>     in SummarizedExperiment. In other words, the different assays can
>     have different `dimnames` (or some assays can have NULL dimnames).
>     Using the example code from SummarizedExperiment:
>
>     nrows <- 200; ncols <- 6
>     counts3 <- counts2 <- counts <-
>        matrix(runif(nrows * ncols, 1, 1e4), nrows)
>
>     rnames <- paste0("F_", sprintf("%03.f", seq_len(nrows)))
>     cnames <- LETTERS[1:6]
>
>     dimnames(counts) <- list(rnames, cnames)
>     dimnames(counts2) <- list(Tags = rnames, Samples = cnames)
>     dimnames(counts3) <- list(Features = rnames, Cells = cnames)
>
>     colData <- DataFrame(row.names=cnames)
>
>     rse <- SummarizedExperiment(assays=SimpleList(c1=counts, c2=counts2,
>     c3=counts3), colData=colData)
>
>     assayNames(rse)
>     names(dimnames(assay(rse, "c1"))) # NULL
>     names(dimnames(assay(rse, "c2"))) # [1] "Tags"    "Samples"
>     names(dimnames(assay(rse, "c3"))) # [1] "Features" "Cells"
>
>     Although not critical, it'd probably be best practice to have a
>     validity check on identical dimnames across all assay, so that one
>     does not have to worry later about `melt` calls returning different
>     column names whether each assay has proper dimnames or not.
>
>
>     2)
>     The initial glitch that prompted this email related to the
>     `reshape2::melt` method that extracts dimnames, if available, in the
>     `scater::plotHighestExprs` function. Anyway, Davis has already
>     prepared a fix to deal with the scenario whereby the assay does have
>     dimnames (e.g. counts in the edgeR::DGEList class that I generally
>     use to import counts). Somehow that wasn't an issue with the SCESet
>     that I was using previously (probably a side-effect of ExpressionSet).
>
>     The point is, the glitch prompted me to think whether a potential
>     standardisation of names(dimnames) could be beneficial, perhaps more
>     specifically in the new `SingleCellExperiment` class (as
>     SummarizedExperiment has a much more general purpose). Considering
>     the fairly specific purpose of the former, I was wondering whether
>     it would be worth:
>
>       * enforcing names(dimnames(x)) to "Features" and "Cells", (bearing
>         in mind that features could still be genes, transcripts, ...)
>       * or maybe dropping dimnames altogether, storing them only once
>         elsewhere (although a slot for that seems overkill)
>
>     There may be other possibilities that I haven't thought of yet, but
>     I thought I'd get the ball rolling.
>     Having well-defined dimnames sounds good practice, with the added
>     benefit of generating aesthetically pleasing column names in melted
>     data-frame as a by-product.
>     However, I can't tell whether the handling of dimnames is something
>     that needs to be handle by individual downstream package developers,
>     or whether standards should be set in parent classes.
>
>
>     Thanks for your time!
>
>     Best,
>     Kevin
>
>

--
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319