Skip to content

[Bioc-devel] Question about external algorithms to Bioconductor package

14 messages · Ryan, A.E.S., Ioannis Vardaxis +4 more

#
Hi,
I have developed a package and is current under review from
Bioconductor. In the future I am considering of making some changes to
the package, basically adding more functions etc.
My package is currently a peak calling algorithm where the input it gets
is either a BAM or SAM format. Because in general a user which runs such
analysis needs to, for example, map the DNA sequences to the reference
genome and obtaining the BAM/SAM file and then turn to my algorithm for
the rest. I was wondering if I am allowed to add those processes to my
package as preliminary stages such that it becomes easier for the user
to have everything in one place.
To do so I will need my package to make use of: SRAtoolkit, bowtie and
SAMtools. Which  I could run in terminal (using system() in R). For
running those stages need the user to have installed those algorithms
off course.
I was wondering if I am allowed to make use of those algorithms  in my
bioconductor package, with the appropriate references off course.
Best,
--
Ioannis Vardaxis
Stipendiat IMF
NTNU
#
Hi,

I don't know the Bioconductor policy for packages that rely on external
tools, but for the specific features you mention, there are Bioconductor
packages to accomplish most or all of them. You can use samtools via
Rsamtools, you can use the Rsubread package in place of bowtie for
alignment, and you can use the SRAdb package for For SRA access. (I believe
there are also several other alignment methods available in Bioconductor,
if Rsubread doesn't do what you need.) Using these packages should ensure
that biocLite() can fully satisfy all the requirements for your package
without the need for separate installation of other command-line tools.

Regards,

Ryan Thompson

On Sun, Nov 12, 2017 at 2:12 PM Ioannis Vardaxis <ioannis.vardaxis at ntnu.no>
wrote:

  
  
#
On Sun, 12 Nov 2017 22:22:56 +0000
Ryan Thompson <rct at thompsonclan.org> wrote:

            
I will quote the "dependencies" part of the package guidelines. I
recommend you to read it all, including the whole developer section
which has plenty of information...

http://bioconductor.org/developers/package-guidelines/#dependencies

Package Dependencies

Packages you depend on must be available via Bioconductor or CRAN;
users and the automated build system have no way to install packages
from other sources. Reuse, rather than re-implement or duplicate,
well-tested functionality from other packages. Specify package
dependencies in the DESCRIPTION file, listed as follows Imports: is for
packages that provide functions, methods, or classes that are used
inside your package name space. Most packages are listed here. Depends:
is for packages that provide essential functionality for users of your
package, e.g., the GenomicRanges package is listed in the Depends:
field of GenomicAlignments. It is unusual for more than three packages
to be listed as ?Depends:?. Suggests: is for packages used in vignettes
or examples, or in conditional code. Enhances: is for packages such as
Rmpi or parallel that enhance the performance of your package, but are
not strictly needed for its functionality. SystemRequirements: is for
listing any external software which is required, but not automatically
installed by the normal package installation process. If the
installation process is non-trivial, a top-level README file should be
included to document the process. A package may rarely offer optional
functionality, e.g., visualization with rgl when that package is
available. Authors then list the package in the Suggests field, and use
requireNamespace() (or loadNamespace()) to condition code execution.
Functions from the loaded namespace should be accessed using ::
notation, e.g., x <- sort(rnorm(1000)) y <- rnorm(1000) z <-
rnorm(1000) + atan2(x,y) if (requireNamespace("rgl", quietly=TRUE))
{ rgl::plot3d(x, y, z, col=rainbow(1000)) } else { ## code when "rgl"
is not available } This approach does not alter the user search() path,
and ensures that the necessary function (plot3d(), from the rgl
package) is used. Such conditional code increases complexity of the
package and frustrates users who do not understand why behavior differs
between installations, so is often best avoided.
11 days later
#
Hi,

I tried the Rsubread package you suggested and the mapping is running.
However it takes like forever to end. Even in parallel it needs some days
to run while bowtie for example needs only a couple of hours in 4 cores.
Is there any way of speeding up Rsubread? Or else I don?t see any reason
using it, and this is a big problem if I cannot use bowtie inside a
bioconductor package.

Thanks
#
On 11/24/2017 09:57 AM, Ioannis Vardaxis wrote:
I'm not following this thread closely but there are two Bowtie 
implementations in Bioconductor

http://bioconductor.org/packages/release/bioc/html/Rbowtie.html
http://bioconductor.org/packages/release/bioc/html/Rbowtie2.html

The fast solution for many problems (mapping to known transcripts) is 
kalisto / salmon, which are not available in Bioconductor -- integrating 
either of these as _libraries_ would be a nice package.

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
#
Hei,

Both kalliston and salmon er for RNA data, I have DNA data. Is there any
other solution rather than Rsubread which is extremely slow?
I am making an algorithm where one of its steps should be to map the DNA
reads to the reference genome. So I would like for the user-convenience to
do it in my algorithm. But if I cannot use anything else than Rsubread
then I might write that the user at this point has to run bowtie with the
given command and then return to the package. However I try to avoid that
if possible.

Ioannis
#
On 11/24/2017 01:25 PM, Ioannis Vardaxis wrote:
As I said there are two implementations of bowtie in Bioconductor; if 
that's your preferred aligner then why not use them, from within R?

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
#
Maybe gmapR?

2017-11-24 16:25 GMT-02:00 Ioannis Vardaxis <ioannis.vardaxis at ntnu.no>:

  
  
1 day later
#
I think that generally Rsubread is 'fast' so you might make sure that 
there are not obvious problems, e.g., aligning reads to the wrong 
reference; maybe Wei Shi will chime in.

Martin
On 11/24/2017 09:57 AM, Ioannis Vardaxis wrote:
This email message may contain legally privileged and/or...{{dropped:2}}
#
Good day,
How much of it do you have? If it's a large size, such as whole genome sequencing, then it would take longer than an RNA-seq experiment regardless of the algorithm you use. I have used Rsubread in the past and I think it performs as well as the other popular alignment programs.
Although it's possible to do short read mapping with packages like Rbowtie, I don't think this preprocessing stage should happen in R which is a statistical programming language. Modularity is good.

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
#
Thanks Martin.

Ioannis: could you please provide your command and screen output from the mapping so I can try to see what might cause the long running time?

Thanks,

Wei
#
I used Rbowtie and the mapping was done in 7 minutes, the results where fine too. Rsubread had been running for 2 days so I had to stop it.
But anyway I can use Rbowtie which is nice :)


Ioannis Vardaxis
Stipendiat NTNU
Sendt fra min iPhone

26. nov. 2017 kl. 04:14 skrev Martin Morgan <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>:

I think that generally Rsubread is 'fast' so you might make sure that there are not obvious problems, e.g., aligning reads to the wrong reference; maybe Wei Shi will chime in.

Martin
On 11/24/2017 09:57 AM, Ioannis Vardaxis wrote:
Hi,
I tried the Rsubread package you suggested and the mapping is running.
However it takes like forever to end. Even in parallel it needs some days
to run while bowtie for example needs only a couple of hours in 4 cores.
Is there any way of speeding up Rsubread? Or else I don?t see any reason
using it, and this is a big problem if I cannot use bowtie inside a
bioconductor package.
Thanks


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
You got to be aware that Rbowtie/bowtie does not detect indels and quite often this is needed for the analysis of DNA sequencing data. Bowtie is probably the only aligner that does not detect indels.

As I mentioned in my last email, I would be happy to take a look at why Rsubread is slow if you can provide your command and screen output. But there are certainly other Bioconductor packages that can do proper alignment for you.

Wei


--------------------
Wei Shi, PhD
Laboratory Head
The Walter and Eliza Hall Institute of Medical Research
Melbourne, Australia


From: Ioannis Vardaxis <ioannis.vardaxis at ntnu.no>
Date: Sunday, November 26, 2017 at 9:30 PM
To: Martin Morgan <martin.morgan at roswellpark.org>
Cc: "A.E.S." <adrian.salatino at conicet.gov.ar>, Ryan Thompson <rct at thompsonclan.org>, "bioc-devel at r-project.org" <bioc-devel at r-project.org>, Wei Shi <shi at wehi.edu.au>
Subject: Re: [Bioc-devel] Question about external algorithms to Bioconductor package

I used Rbowtie and the mapping was done in 7 minutes, the results where fine too. Rsubread had been running for 2 days so I had to stop it.
But anyway I can use Rbowtie which is nice :)

Ioannis Vardaxis
Stipendiat NTNU
Sendt fra min iPhone

26. nov. 2017 kl. 04:14 skrev Martin Morgan <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>:
I think that generally Rsubread is 'fast' so you might make sure that there are not obvious problems, e.g., aligning reads to the wrong reference; maybe Wei Shi will chime in.

Martin
On 11/24/2017 09:57 AM, Ioannis Vardaxis wrote:
Hi,
I tried the Rsubread package you suggested and the mapping is running.
However it takes like forever to end. Even in parallel it needs some days
to run while bowtie for example needs only a couple of hours in 4 cores.
Is there any way of speeding up Rsubread? Or else I don?t see any reason
using it, and this is a big problem if I cannot use bowtie inside a
bioconductor package.
Thanks


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
First I create the index using:

buildindex(basename=??Rsubread_index??,reference=??/path/to/hg19.fa",colorspace=FALSE)

Where hg19.fa is the fasta file with the chromosomes, chr1 etc.

Then I run:

                                 align( index=??/path/to/Rsubread_index??,readfile1=??/path/to/reads.fastq??,type=??dna??,output_format=??BAM??,
output_file=??/path/to/output.bam??,maxMismatches=1,unique=TRUE,indels=0)

I have short reads from hg19. And I want to map them with at most 1 mismatch and keep the uniquely mapped only. Therefore I set indels=0. From that I get zero occurrences. Rbowtie maps 73% on the other hand.

Thanks for the help!


--
Ioannis Vardaxis
Stipendiat IMF
NTNU

From: Wei Shi <shi at wehi.edu.au<mailto:shi at wehi.edu.au>>
Date: Sunday, 26 November 2017 at 23:08
To: Ioannis Vardaxis <ioannis.vardaxis at ntnu.no<mailto:ioannis.vardaxis at ntnu.no>>, Martin Morgan <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>
Cc: "A.E.S." <adrian.salatino at conicet.gov.ar<mailto:adrian.salatino at conicet.gov.ar>>, Ryan Thompson <rct at thompsonclan.org<mailto:rct at thompsonclan.org>>, "bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>" <bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>>
Subject: Re: [Bioc-devel] Question about external algorithms to Bioconductor package

You got to be aware that Rbowtie/bowtie does not detect indels and quite often this is needed for the analysis of DNA sequencing data. Bowtie is probably the only aligner that does not detect indels.

As I mentioned in my last email, I would be happy to take a look at why Rsubread is slow if you can provide your command and screen output. But there are certainly other Bioconductor packages that can do proper alignment for you.

Wei


--------------------
Wei Shi, PhD
Laboratory Head
The Walter and Eliza Hall Institute of Medical Research
Melbourne, Australia


From: Ioannis Vardaxis <ioannis.vardaxis at ntnu.no<mailto:ioannis.vardaxis at ntnu.no>>
Date: Sunday, November 26, 2017 at 9:30 PM
To: Martin Morgan <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>
Cc: "A.E.S." <adrian.salatino at conicet.gov.ar<mailto:adrian.salatino at conicet.gov.ar>>, Ryan Thompson <rct at thompsonclan.org<mailto:rct at thompsonclan.org>>, "bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>" <bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>>, Wei Shi <shi at wehi.edu.au<mailto:shi at wehi.edu.au>>
Subject: Re: [Bioc-devel] Question about external algorithms to Bioconductor package

I used Rbowtie and the mapping was done in 7 minutes, the results where fine too. Rsubread had been running for 2 days so I had to stop it.
But anyway I can use Rbowtie which is nice :)

Ioannis Vardaxis
Stipendiat NTNU
Sendt fra min iPhone

26. nov. 2017 kl. 04:14 skrev Martin Morgan <martin.morgan at roswellpark.org<mailto:martin.morgan at roswellpark.org>>:
I think that generally Rsubread is 'fast' so you might make sure that there are not obvious problems, e.g., aligning reads to the wrong reference; maybe Wei Shi will chime in.

Martin
On 11/24/2017 09:57 AM, Ioannis Vardaxis wrote:
Hi,
I tried the Rsubread package you suggested and the mapping is running.
However it takes like forever to end. Even in parallel it needs some days
to run while bowtie for example needs only a couple of hours in 4 cores.
Is there any way of speeding up Rsubread? Or else I don??t see any reason
using it, and this is a big problem if I cannot use bowtie inside a
bioconductor package.
Thanks


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.