Ah, ok, sorry. I didn't get that this was also fixed at the same time. Thanks!!!! robert. -------- Original message -------- From: Valerie Obenchain <vobencha at fredhutch.org> Date:17/03/2015 19:59 (GMT+01:00) To: Robert Castelo <robert.castelo at upf.edu>,bioc-devel at r-project.org Subject: Re: [Bioc-devel] little tiny bug in CDSID annotations from predictCoding() The second mapping in .localCoordinates() in AllUtils.R was using a listed 'to' instead of unlisted 'to'. This means the mapping was at the level of the outer list elements vs the inner.? We want to map/overlap with an unlisted object because each range can have a different CDSID. This second mapping provided the index for retrieving the CDSIDs which is why you saw a difference between the two. This was a bug for predictCoding only. Valerie
On 03/17/2015 11:39 AM, Robert Castelo wrote:
Thanks! Do you have an explanation for the apparent disagreement in
CDSID annotations that i described below the bug, between
predictCoding() and locateVariants()?
robert.
-------- Original message --------
From: Valerie Obenchain
Date:17/03/2015 19:18 (GMT+01:00)
To: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] little tiny bug in CDSID annotations from
predictCoding()
Hi Robert,
Thanks for reporting the typo and bug. Now fixed in 1.13.41.
Valerie
On 03/17/2015 10:58 AM, Robert Castelo wrote:
? > in my message below, the line that it says:
? >
? > head(loc_all$CDSID)
? >
? > it should say
? >
? > head(coding2$CDSID)
? >
? > cheers,
? >
? > robert.
? >
? > ==========================================
? > hi,
? >
? > there is a little tiny bug in the current devel version of
? > VariantAnnotation::predictCoding(), and more concretely within
? > VariantAnnotation:::.localCoordinates(), that precludes the correct
? > annotation of the CDSID column:
? >
? > library(VariantAnnotation)
? > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
? > library(BSgenome.Hsapiens.UCSC.hg19)
? >
? > vcf <- readVcf(system.file("extdata", "CEUtrio.vcf.bgz",
? > package="VariantFiltering"), genome="hg19")
? > seqlevelsStyle(vcf) <- seqlevelsStyle(txdb)
? > vcf <- dropSeqlevels(vcf, "chrM")
? > coding1 <- predictCoding(vcf, txdb, Hsapiens)
? > head(coding1$CDSID)
? > IntegerList of length 6
? > [[1]] integer(0)
? > [[2]] integer(0)
? > [[3]] integer(0)
? > [[4]] integer(0)
? > [[5]] integer(0)
? > [[6]] integer(0)
? > table(elementLengths(coding1$CDSID))
? >
? >???? 0
? > 6038
? >
? > my sessionInfo() is at the end of the message.
? >
? > here is the patch, just replacing 'map2$trancriptHits' by
? > 'map2$transcriptsHits':
? >
? > --- R/AllUtilities.R??? (revision 100756)
? > +++ R/AllUtilities.R??? (working copy)
? > @@ -284,7 +284,7 @@
? >?????? cdsid <- IntegerList(integer(0))
? >?????? map2 <- mapToTranscripts(unname(from)[xHits], to,
? >??????????????????????????????? ignore.strand=ignore.strand)
? > -??? cds <- mcols(unlist(to,
use.names=FALSE))$cds_id[map2$trancriptsHits]
? > +??? cds <- mcols(unlist(to,
use.names=FALSE))$cds_id[map2$transcriptsHits]
? >?????? if (length(cds)) {
? >?????????? cdslst <- unique(splitAsList(cds, map2$xHits))
? >?????????? cdsid <- cdslst
? >
? > with this fix then things seem to work again:
? >
? > coding1 <- predictCoding(vcf, txdb, Hsapiens)
? >> head(coding1$CDSID)
? > IntegerList of length 6
? > [["1"]] 21771
? > [["2"]] 21771
? > [["3"]] 21771
? > [["4"]] 21771
? > [["5"]] 21428
? > [["6"]] 21428
? > table(elementLengths(coding1$CDSID))
? >
? >???? 1??? 2??? 3??? 4??? 5??? 6??? 7??? 8??? 9?? 10?? 12?? 13?? 14
16?? 19
? >?? 873 1229 1024? 993? 615? 524? 324? 168?? 82?? 21?? 12?? 15?? 42
76?? 40
? >
? > while investigating this bug i used VariantAnnotation::locateVariants()
? > which also annotates the CDSID column, and it seemed to be working.
? > however, i noticed that both, predictCoding() and locateVariants(), do
? > not give an identical annotation for the CDSID column in coding variants:
? >
? > coding2 <- locateVariants(vcf, txdb, CodingVariants())
? > head(loc_all$CDSID)
? > IntegerList of length 6
? > [["1"]] 210777
? > [["2"]] 210777
? > [["3"]] 210777
? > [["4"]] 210778
? > [["5"]] 208140
? > [["6"]] 208141
? > table(elementLengths(coding2$CDSID))
? >
? >???? 1??? 2??? 3??? 4
? > 4987? 901? 138?? 12
? >
? > in principle, it seems that both are annotating valid CDSID keys:
? >
? > allcdsinfo <- select(txdb, keys=keys(txdb, keytype="CDSID"),
? > columns="CDSID", keytype="CDSID")
? > sum(!as.character(unlist(coding1$CDSID, use.names=FALSE)) %in%
? > allcdsinfo$CDSID)
? > [1] 0
? > sum(!as.character(unlist(coding2$CDSID, use.names=FALSE)) %in%
? > allcdsinfo$CDSID)
? > [1] 0
? >
? > but predictCoding() annotates CDSID values that are not present in
? > locateVariants() annotations and viceversa:
? >
? > sum(!as.character(unlist(coding1$CDSID, use.names=FALSE)) %in%
? > as.character(unlist(coding2$CDSID, use.names=FALSE)))
? > [1] 24057
? > sum(!as.character(unlist(coding2$CDSID, use.names=FALSE)) %in%
? > as.character(unlist(coding1$CDSID, use.names=FALSE)))
? > [1] 7251
? >
? > length(unique(intersect(as.character(unlist(coding2$CDSID,
? > use.names=FALSE)), as.character(unlist(coding1$CDSID,
use.names=FALSE)))))
? > [1] 0
? >
? > should not both annotate the same CDSID values on coding variants?
? >
? >
? > thanks!
? > robert.
? > ps: sessionInfo()
? > R Under development (unstable) (2014-10-14 r66765)
? > Platform: x86_64-unknown-linux-gnu (64-bit)
? >
? > locale:
? >?? [1] LC_CTYPE=en_US.UTF8?????? LC_NUMERIC=C
? >?? [3] LC_TIME=en_US.UTF8??????? LC_COLLATE=en_US.UTF8
? >?? [5] LC_MONETARY=en_US.UTF8??? LC_MESSAGES=en_US.UTF8
? >?? [7] LC_PAPER=en_US.UTF8?????? LC_NAME=C
? >?? [9] LC_ADDRESS=C????????????? LC_TELEPHONE=C
? > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
? >
? > attached base packages:
? > [1] stats4??? parallel? stats???? graphics? grDevices
? > [6] utils???? datasets? methods?? base
? >
? > other attached packages:
? >?? [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.1
? >?? [2] GenomicFeatures_1.19.31
? >?? [3] AnnotationDbi_1.29.17
? >?? [4] Biobase_2.27.2
? >?? [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0
? >?? [6] BSgenome_1.35.17
? >?? [7] rtracklayer_1.27.8
? >?? [8] VariantAnnotation_1.13.40
? >?? [9] Rsamtools_1.19.44
? > [10] Biostrings_2.35.11
? > [11] XVector_0.7.4
? > [12] GenomicRanges_1.19.46
? > [13] GenomeInfoDb_1.3.13
? > [14] IRanges_2.1.43
? > [15] S4Vectors_0.5.22
? > [16] BiocGenerics_0.13.6
? > [17] vimcom_1.0-0
? > [18] setwidth_1.0-3
? > [19] colorout_1.0-3
? >
? > loaded via a namespace (and not attached):
? >?? [1] BiocParallel_1.1.15????? biomaRt_2.23.5
? >?? [3] bitops_1.0-6???????????? DBI_0.3.1
? >?? [5] GenomicAlignments_1.3.31 RCurl_1.95-4.5
? >?? [7] RSQLite_1.0.0??????????? tools_3.2.0
? >?? [9] XML_3.98-1.1???????????? zlibbioc_1.13.2
? >
? > _______________________________________________
? > Bioc-devel at r-project.org mailing list
? > https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fredhutch.org Phone: (206) 667-3158 [[alternative HTML version deleted]]