[Bioc-devel] read_bed()

The generic documentation does not mention it, but see ?import.bed.
It's similar to colClasses on read.table().

On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
Thankyou Michael,

How do I use the extraCols argument? The documentation does not mention an `extraCols` argument explicitly, so it must be one of the ellipsis arguments, but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 (ten extra columns) or so?

Aditya

________________________________________
From: Michael Lawrence [lawrence.michael at gene.com]
Sent: Tuesday, September 17, 2019 2:05 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] read_bed()

It breaks it because it's not standard BED; however, using the
extraCols= argument should work in this case. Requiring an explicit
format specification is intentional, because it provides validation
and type safety, and it communicates the format to a future reader.
This also looks a bit like a bedPE file, so you might consider using
the Pairs data structure.

Michael

On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de> wrote:

Hi Michael,

Yeah, I also noticed that the attachment was eaten when it entered the bio-devel list.

The file is also accessible in the extdata of the multicrispr:
https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed

A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.

I am curious as to why you feel that having additional columns in a bedfile would break it?

Cheers,

Aditya

________________________________________
From: Michael Lawrence [lawrence.michael at gene.com]
Sent: Tuesday, September 17, 2019 1:41 PM
To: Bhagwat, Aditya
Cc: Shepherd, Lori; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] read_bed()

I don't see an attachment, nor can I find the multicrispr package
anywhere. The "addressed soon" was referring to the BEDX+Y formats,
which was addressed many years ago, so I've updated the documentation.
Broken BED files will never be supported.

Michael

On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de> wrote:

Hi Lori,

I remember now - I tried this function earlier, but it does not work for my bedfiles, like the one in attach.

bedfile      <- system.file('extdata/SRF.bed', package = 'multicrispr')

targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 'mm10')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : scan() expected 'an integer', got 'chr2'

Perhaps this sentence in `?rtracklayer::import` points to the source of the error?

many tools and organizations have extended BED with additional columns. These are not officially valid BED files, and as such rtracklayer does not yet support them (this will be addressed soon).

Which brings the question: how soon is soon :-D ?

Aditya

________________________________
From: Shepherd, Lori [Lori.Shepherd at RoswellPark.org]
Sent: Tuesday, September 17, 2019 1:02 PM
To: Bhagwat, Aditya; bioc-devel at r-project.org
Subject: Re: read_bed()

Please look at rtracklayer::import()  function that we recommend for reading of BAM files along with other common formats.

Cheers,

Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

________________________________
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Bhagwat, Aditya <Aditya.Bhagwat at mpi-bn.mpg.de>
Sent: Tuesday, September 17, 2019 6:58 AM
To: bioc-devel at r-project.org <bioc-devel at r-project.org>
Subject: [Bioc-devel] read_bed()

Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational package, e.g. `GenomicRanges`, given the rather basal nature of this functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome info (to allow for implicit range validity checking<https://support.bioconductor.org/p/124250>) and plots the karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya

#' Read bedfile into GRanges
#'
#' @param bedfile        file path
#' @param bsgenome       BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based     logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot           logical(1)
#' @param verbose        logical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'       A good discussion on these two alternative codings is given
#'       by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
    bedfile,
    bsgenome,
    zero_based    = TRUE,
    rm_duplicates = TRUE,
    plot          = TRUE,
    verbose       = TRUE
){
    # Assert
    assert_all_are_existing_files(bedfile)
    assert_is_a_bool(verbose)
    assert_is_a_bool(rm_duplicates)
    assert_is_a_bool(zero_based)

    # Comply
    seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

    # Read
    if (verbose) cmessage('\tRead %s', bedfile)
    dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
            col.names = c('seqnames', 'start', 'end', 'strand'))
    data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

    # Transform coordinates: 0-based -> 1-based
    if (zero_based){
        if (verbose)    cmessage('\t\tConvert 0 -> 1-based')
        dt[, start := start + 1]
    }

    if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
                            nrow(dt), length(unique(dt$seqnames)))

    # Drop duplicates
    if (rm_duplicates){
        is_duplicated <- cduplicated(dt)
        if (any(is_duplicated)){
            if (verbose) cmessage('\t\t        %d after removing duplicates')
            dt %<>% extract(!duplicated)
        }
    }

    # Turn into GRanges
    gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

    # Plot and return
    title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
    if (plot) plot_karyogram(gr, title)
    gr
}

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla at gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla at gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla at gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

[Bioc-devel] read_bed()

Thread (22 messages)