[Bioc-devel] Bioc needs better support for variants
On 04/15/2011 01:00 PM, Michael Lawrence wrote:
On Fri, Apr 15, 2011 at 7:19 AM, Martin Morgan <mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>> wrote:
On 04/15/2011 06:05 AM, Vincent Carey wrote:
I will comment on my limited view and progress. I need to work from
an exemplar. I committed cheung2010 in the experimental data
archive
(devel only). This relates to PMID 20856902, genetics of expression
in immortalized B cells.
There are 147 individuals with hapmap phase 3 genotypes and hgfocus
arrays (:-( but about 45 have RNA-seq data in GEO. fastq is
available
with the SRAtools fastq-dump and you can get the sra data reasonably
quickly using ascp. I will eventually make a sample from their
RNA-seq data available in this package to look at SNP-driven
allele-specific expression and other aspects of SNP-dependent
expression regulation.
Probably there is DNA-seq data out there on these coriell cell lines
but for the moment I will be looking at the chip-based SNPs and
imputation on those. Better representations for 8 million SNP per
sample would probably come in handy, but breaking them up by
chromosome in SnpMatrix instances is OK so far. I think we have to
recognize that in any of these paradigms discrete calls are
often not
going to cut it, and uncertainty representations will be important.
VCF representations of indels in 1000 genomes are available, but I
don't know that we have good tools for importing and modeling those.
Another exemplar that should be considered.
On Fri, Apr 15, 2011 at 7:16 AM, Michael Lawrence
<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
wrote:
Hi guys,
Congrats on the release. For this next one, one focus, in my
opinion, should
be on analyzing variants in the context of sequencing data.
This includes
infrastructure for things like calling variants (in DNA and
RNA), as well as
determining their effects (e.g., coding and splicing
changes). It would be
good if we could come up with a plan. If we had one, we
could commit some
resources here to the problem.
Is anyone willing to help out on this? What do you guys think?
We could certainly play a role in annotation of variants and support
for interfacing with established 3rd party formats. Obviously also
the representation of variants that overlap with IRanges /
Biostrings infrastructure. Martin
Great, this is in line with what I was thinking. We need a way to
formally represent sets of variants, as well as transcripts and proteins
(i.e., something based on a GRange[List]). Then we can map between
coordinate systems and request the consequences of mutations. I was
looking at the Ensembl variations Perl API; it might be good for
inspiration.
Is there somewhere like a wiki where we could start hashing this out?
Started a page here http://wiki.fhcrc.org/bioc/Variant_Calls individuals should be able to create their own accounts from links at the very top of the page. Martin
Michael
Thanks,
Michael
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793