Hi all,
I recently found that rtracklayer's GFF3 file read was unable to read
GFF3 files produced by Cufflinks. I tracked the problem down to the
occurrence of equals signs in tag values. For example, the following
line was problematic:
C123300344 Cufflinks transcript 1 132 . -
.
ID=TCONS_00000337;geneID=XLOC_000337;oId=ENSMMUP00000032229;nearest_ref=ENSMMUP00000032229;class_code==;tss_id=TSS337;p_id=P1
due to the "class_code==" part (the value of the class code is actually
an equals sign). Obviously the bug occurs because "strsplit" doesn't
stop after the first split, but keeps splitting at subsequent
occurrences of the separator. I have modified the reader to be able to
handle this case, which as far as I know is perfectly valid. Instead of
strsplit, I use regexpr to find only the *first* occurrence of an equals
sign, and then I use substr to extract the part of the tag before and
after the equals sign. The attached file is a patch against "R/gff.R" in
the rtracklayer dist. I developed the patch against version 1.16.1.
Regards,
-Ryan Thompson