Skip to content

[Bioc-devel] Samtools dependency

7 messages · Hervé Pagès, Jonathon Hill, Martin Morgan

#
Hi,

I am working through the process of submitting a new package (MMAPPR2). We are having a problem with the build failing, because our package requires Samtools installed. We cannot use Rsamtools, as we depend on features not implemented in the package. How do we resolve the issue? What is the policy for system dependencies? We have samtools listed in the DESCRIPTION and installation instructions in our README, but I am sure that is not enough to get it installed on the Build and Check servers.

Thanks,

Jonathon Hill
#
Hi Jonathon,

Have you considered depending on Rhtslib? See 
https://bioconductor.org/packages/Rhtslib

Rsamtools itself is implemented on top of Rhtslib. Note that other 
Bioconductor packages (e.g. DiffBind, deepSNV, BitSeq, qrqc, QuasR, 
seqbias, TransView, etc...) use Rhtslib internally to implement features 
not implemented in Rsamtools.

H.
On 8/23/19 11:05, Jonathon Hill wrote:

  
    
#
I had not until today. I spent the afternoon looking at the possibility, and it looks like it would be beyond my lab?s skills. We do not have anyone comfortable in C, as we do everything in R. The problem is that we need to get the results of the mpileup command with BAQ score. Although it has a pileup command, the Rsamtools implementation does not include the ability to retrieve the BAQ score as far as we can tell, so we had to fall back on making a system call to Rsamtools and reading in the results. Using Rhtslib is intriguing, but it looks like we would need several header files in Samtools as opposed to htslib and then implement our own C function. Again, we do not have anyone that could do this. We are scientists, not programmers. Am I correct on what it would require? Do you know of any other alternatives? 

Jonathon
#
can you provide an example of the samtools command line that you evaluate?

?On 8/23/19, 6:11 PM, "Bioc-devel on behalf of Jonathon Hill" <bioc-devel-bounces at r-project.org on behalf of jhill at byu.edu> wrote:

    I had not until today. I spent the afternoon looking at the possibility, and it looks like it would be beyond my lab?s skills. We do not have anyone comfortable in C, as we do everything in R. The problem is that we need to get the results of the mpileup command with BAQ score. Although it has a pileup command, the Rsamtools implementation does not include the ability to retrieve the BAQ score as far as we can tell, so we had to fall back on making a system call to Rsamtools and reading in the results. Using Rhtslib is intriguing, but it looks like we would need several header files in Samtools as opposed to htslib and then implement our own C function. Again, we do not have anyone that could do this. We are scientists, not programmers. Am I correct on what it would require? Do you know of any other alternatives? 
    
    Jonathon
> On Aug 23, 2019, at 12:22 PM, Pages, Herve <hpages at fredhutch.org> wrote:
> 
    > Hi Jonathon,
    > 
    > Have you considered depending on Rhtslib? See 
    > https://bioconductor.org/packages/Rhtslib
    > 
    > Rsamtools itself is implemented on top of Rhtslib. Note that other 
    > Bioconductor packages (e.g. DiffBind, deepSNV, BitSeq, qrqc, QuasR, 
    > seqbias, TransView, etc...) use Rhtslib internally to implement features 
    > not implemented in Rsamtools.
    > 
    > H.
    >
> On 8/23/19 11:05, Jonathon Hill wrote:
>> Hi,
    >> 
    >> I am working through the process of submitting a new package (MMAPPR2). We are having a problem with the build failing, because our package requires Samtools installed. We cannot use Rsamtools, as we depend on features not implemented in the package. How do we resolve the issue? What is the policy for system dependencies? We have samtools listed in the DESCRIPTION and installation instructions in our README, but I am sure that is not enough to get it installed on the Build and Check servers.
    >> 
    >> Thanks,
    >> 
    >> Jonathon Hill
    >> 
    >> _______________________________________________
    >> Bioc-devel at r-project.org mailing list
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AEKZKMjjFTbu5U_zn0bacvzv69lx_S5s7Yb6dSOXbJs&s=s5EMLCdAbnqgXWs3_-Sxm52Zuc3pqFirWz7z3ymBruU&e=
    >> 
    > 
    > -- 
    > Herv? Pag?s
    > 
    > Program in Computational Biology
    > Division of Public Health Sciences
    > Fred Hutchinson Cancer Research Center
    > 1100 Fairview Ave. N, M1-B514
    > P.O. Box 19024
    > Seattle, WA 98109-1024
    > 
    > E-mail: hpages at fredhutch.org
    > Phone:  (206) 667-5791
    > Fax:    (206) 667-1319
    
    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
#
Yes, gladly. Thank you for taking time to help me. Here is the exact line of R code where we build the samtools command (the file to be tested is added later):

args <- paste("mpileup -ERI",   #Redo Baq, ignore readgroups, and skip indels
                 "-f", refFasta(param),
                 "-C 50",
                 "--min-MQ", minMapQuality(param),
                 "--min-BQ", minBaseQuality(param),
                 "--region", as.character(chrRange, ignore.strand=TRUE))

As you can see, we use the BAQ score to filter. We have tried to implement it without BAQ (using Rsamtools) and found it negatively affected our results.
3 days later
#
Hi,

I just wanted to check in, as I know we got interrupted by the weekend. Any thoughts on the best way forward? 

Thanks,

Jonathon
#
I think you should proceed supposing that the build system will not install samtools, and that an R-based implementation is 'in the future'. As such, you should arrange in your examples / vignette to provide 'mock' output from samtools that you then process in your package -- kind of like the old cooking shows where the roast went into the oven before the commercial and came out ready to eat after...

Martin
?On 8/27/19, 12:38 PM, "Jonathon Hill" <jhill at byu.edu> wrote:
Hi,
    
    I just wanted to check in, as I know we got interrupted by the weekend. Any thoughts on the best way forward? 
    
    Thanks,
    
    Jonathon
> On Aug 23, 2019, at 5:00 PM, Jonathon Hill <jhill at byu.edu> wrote:
> 
    > Yes, gladly. Thank you for taking time to help me. Here is the exact line of R code where we build the samtools command (the file to be tested is added later):
    > 
    > args <- paste("mpileup -ERI",   #Redo Baq, ignore readgroups, and skip indels
    >                 "-f", refFasta(param),
    >                 "-C 50",
    >                 "--min-MQ", minMapQuality(param),
    >                 "--min-BQ", minBaseQuality(param),
    >                 "--region", as.character(chrRange, ignore.strand=TRUE))
    > 
    > As you can see, we use the BAQ score to filter. We have tried to implement it without BAQ (using Rsamtools) and found it negatively affected our results.
    >
>> On Aug 23, 2019, at 4:53 PM, Martin Morgan <mtmorgan.bioc at gmail.com> wrote:
>> 
    >> can you provide an example of the samtools command line that you evaluate?
    >> 
    >> ?On 8/23/19, 6:11 PM, "Bioc-devel on behalf of Jonathon Hill" <bioc-devel-bounces at r-project.org on behalf of jhill at byu.edu> wrote:
    >> 
    >>   I had not until today. I spent the afternoon looking at the possibility, and it looks like it would be beyond my lab?s skills. We do not have anyone comfortable in C, as we do everything in R. The problem is that we need to get the results of the mpileup command with BAQ score. Although it has a pileup command, the Rsamtools implementation does not include the ability to retrieve the BAQ score as far as we can tell, so we had to fall back on making a system call to Rsamtools and reading in the results. Using Rhtslib is intriguing, but it looks like we would need several header files in Samtools as opposed to htslib and then implement our own C function. Again, we do not have anyone that could do this. We are scientists, not programmers. Am I correct on what it would require? Do you know of any other alternatives? 
    >> 
    >>   Jonathon
    >>
>>> On Aug 23, 2019, at 12:22 PM, Pages, Herve <hpages at fredhutch.org> wrote:
>>> 
    >>> Hi Jonathon,
    >>> 
    >>> Have you considered depending on Rhtslib? See 
    >>> https://bioconductor.org/packages/Rhtslib
    >>> 
    >>> Rsamtools itself is implemented on top of Rhtslib. Note that other 
    >>> Bioconductor packages (e.g. DiffBind, deepSNV, BitSeq, qrqc, QuasR, 
    >>> seqbias, TransView, etc...) use Rhtslib internally to implement features 
    >>> not implemented in Rsamtools.
    >>> 
    >>> H.
    >>>
>>> On 8/23/19 11:05, Jonathon Hill wrote:
>>>> Hi,
    >>>> 
    >>>> I am working through the process of submitting a new package (MMAPPR2). We are having a problem with the build failing, because our package requires Samtools installed. We cannot use Rsamtools, as we depend on features not implemented in the package. How do we resolve the issue? What is the policy for system dependencies? We have samtools listed in the DESCRIPTION and installation instructions in our README, but I am sure that is not enough to get it installed on the Build and Check servers.
    >>>> 
    >>>> Thanks,
    >>>> 
    >>>> Jonathon Hill
    >>>> 
    >>>> _______________________________________________
    >>>> Bioc-devel at r-project.org mailing list
    >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=AEKZKMjjFTbu5U_zn0bacvzv69lx_S5s7Yb6dSOXbJs&s=s5EMLCdAbnqgXWs3_-Sxm52Zuc3pqFirWz7z3ymBruU&e=
    >>>> 
    >>> 
    >>> -- 
    >>> Herv? Pag?s
    >>> 
    >>> Program in Computational Biology
    >>> Division of Public Health Sciences
    >>> Fred Hutchinson Cancer Research Center
    >>> 1100 Fairview Ave. N, M1-B514
    >>> P.O. Box 19024
    >>> Seattle, WA 98109-1024
    >>> 
    >>> E-mail: hpages at fredhutch.org
    >>> Phone:  (206) 667-5791
    >>> Fax:    (206) 667-1319
    >> 
    >>   _______________________________________________
    >>   Bioc-devel at r-project.org mailing list
    >>   https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >> 
    >