Skip to content

[Bioc-devel] scanVCF: _DNAencode(): invalid DNAString input character: '1'

7 messages · Obenchain, Valerie, Kevin RUE, Erik Fasterius

#
I recently started to get a weird error when building the vignette to my seqCAT package, related to a VCF file I use as example data. The error itself looks like this:

scanVcf: _DNAencode(): invalid DNAString input character: '1' (byte value 49) path: (...)/seqCAT/extdata/example.vcf.gz

I can also reproduce the error by a simple `VariantAnnotation::readVCF()` call. It has worked fine until the latest devel-updates of other Bioconductor packages, so I assumed it was some new change that caused the error, but I cannot find anything in the NEWS seemingly related to this. I also tried to troubleshoot by manually inspecting my file, and it seems that the ANN field is the culprit; I can read the VCF if I remove the entirety of the INFO column. I cannot, however, seem to locate the erroneous data itself.

Does anybody have any idea what causes this?
#
Hi Erik,

There have been a few changes to VariantAnnotation lately. I'll take a 
look at seqCAT and get back to you.

Valerie
On 1/8/19 6:07 AM, Erik Fasterius wrote:
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Hi all,

Same kind of error for my TVTB package (
https://master.bioconductor.org/checkResults/3.8/bioc-LATEST/TVTB/malbec1-checksrc.html
).
I'll run R CMD check locally ASAP to see whether I need to update TVTB or
if it's something upstream.

Best,
Kevin

On Tue, Jan 8, 2019 at 5:05 PM Obenchain, Valerie <
Valerie.Obenchain at roswellpark.org> wrote:

            

  
  
#
The problem is related to a change I made to handle buffer overflow:

https://github.com/Bioconductor/VariantAnnotation/issues/19

This clearly doesn't work for all cases, thanks for reporting the 
problems with seqCAT and TVTB. I've reverted the change so your packages 
will build and will re-think the fix.

Valerie
On 1/8/19 10:45 AM, Kevin RUE wrote:
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Thanks for the update Valerie.
Needless to say, I ran R CMD check locally yesterday, and it crashed with
the same issue.

Naive question, without looking into the original issue: is it purely a
programming issue, or is there a chance that our (seqCAT and TVTB) VCF
files need to be updated to match any kind of new standard?

Best,
Kevin

On Wed, Jan 9, 2019 at 3:49 PM Obenchain, Valerie <
Valerie.Obenchain at roswellpark.org> wrote:

            

  
  
#
On 1/9/19 8:36 AM, Kevin RUE wrote:
I don't think it's related to the vcf standard but more with handling 
buffer overflow gracefully under different circumstances.

Valerie
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Thanks for the update, Valerie! And thanks for asking the question that I also had in mind, Kevin. I?ll follow the GitHub issue and retry once the change has been reverted.

Erik
On 10 Jan 2019, at 03:13, Obenchain, Valerie <Valerie.Obenchain at RoswellPark.org<mailto:Valerie.Obenchain at RoswellPark.org>> wrote:

        
On 1/9/19 8:36 AM, Kevin RUE wrote:
Thanks for the update Valerie.
Needless to say, I ran R CMD check locally yesterday, and it crashed
with the same issue.

Naive question, without looking into the original issue: is it purely a
programming issue, or is there a chance that our (seqCAT and TVTB) VCF
files need to be updated to match any kind of new standard?

I don't think it's related to the vcf standard but more with handling
buffer overflow gracefully under different circumstances.

Valerie




Best,
Kevin

On Wed, Jan 9, 2019 at 3:49 PM Obenchain, Valerie
<Valerie.Obenchain at roswellpark.org<mailto:Valerie.Obenchain at roswellpark.org>
<mailto:Valerie.Obenchain at roswellpark.org>> wrote:
The problem is related to a change I made to handle buffer overflow:

   https://github.com/Bioconductor/VariantAnnotation/issues/19

   This clearly doesn't work for all cases, thanks for reporting the
   problems with seqCAT and TVTB. I've reverted the change so your
   packages
   will build and will re-think the fix.

   Valerie
On 1/8/19 10:45 AM, Kevin RUE wrote:
Hi all,

Same kind of error for my TVTB package

   (https://master.bioconductor.org/checkResults/3.8/bioc-LATEST/TVTB/malbec1-checksrc.html).


I'll run R CMD check locally ASAP to see whether I need to update
   TVTB
or if it's something upstream.

Best,
Kevin

On Tue, Jan 8, 2019 at 5:05 PM Obenchain, Valerie
<Valerie.Obenchain at roswellpark.org<mailto:Valerie.Obenchain at roswellpark.org>
   <mailto:Valerie.Obenchain at roswellpark.org>
<mailto:Valerie.Obenchain at roswellpark.org
<mailto:Valerie.Obenchain at roswellpark.org>>> wrote:
Hi Erik,

     There have been a few changes to VariantAnnotation lately.
   I'll take a
     look at seqCAT and get back to you.

     Valerie
On 1/8/19 6:07 AM, Erik Fasterius wrote:
> I recently started to get a weird error when building the
     vignette to my seqCAT package, related to a VCF file I use as
     example data. The error itself looks like this:
      >
      > scanVcf: _DNAencode(): invalid DNAString input character: '1'
     (byte value 49) path: (...)/seqCAT/extdata/example.vcf.gz
      >
      > I can also reproduce the error by a simple
     `VariantAnnotation::readVCF()` call. It has worked fine until the
     latest devel-updates of other Bioconductor packages, so I
   assumed it
     was some new change that caused the error, but I cannot find
     anything in the NEWS seemingly related to this. I also tried to
     troubleshoot by manually inspecting my file, and it seems
   that the
     ANN field is the culprit; I can read the VCF if I remove the
     entirety of the INFO column. I cannot, however, seem to
   locate the
     erroneous data itself.
      >
      > Does anybody have any idea what causes this?
      >
      >       [[alternative HTML version deleted]]
      >
      > _______________________________________________
      > Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> <mailto:Bioc-devel at r-project.org>
   <mailto:Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>>
     mailing list
      > https://stat.ethz.ch/mailman/listinfo/bioc-devel
      >



     This email message may contain legally privileged and/or
     confidential information.  If you are not the intended
   recipient(s),
     or the employee or agent responsible for the delivery of this
     message to the intended recipient(s), you are hereby notified
   that
     any disclosure, copying, distribution, or use of this email
   message
     is prohibited.  If you have received this message in error,
   please
     notify the sender immediately by e-mail and delete this email
     message from your computer. Thank you.
     _______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> <mailto:Bioc-devel at r-project.org>
   <mailto:Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>>
   mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




   This email message may contain legally privileged and/or
   confidential information.  If you are not the intended recipient(s),
   or the employee or agent responsible for the delivery of this
   message to the intended recipient(s), you are hereby notified that
   any disclosure, copying, distribution, or use of this email message
   is prohibited.  If you have received this message in error, please
   notify the sender immediately by e-mail and delete this email
   message from your computer. Thank you.




This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.