[Bioc-devel] SRAdb missing runs
Hi Sean, Hmm. I thought I _had_ already updated the database, but, Lo, trying again, and, guess what, it does now return 1 row. Hooray, excellent, bravo, and thanks for your sleuthing. ~Malcolm
-----Original Message----- From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean Davis Sent: Saturday, October 08, 2011 6:09 PM To: Cook, Malcolm Cc: Jack Zhu; bioc-devel at r-project.org Subject: Re: [Bioc-devel] SRAdb missing runs On Fri, Oct 7, 2011 at 11:28 PM, Cook, Malcolm <MEC at stowers.org> wrote:
Jack & Sean, I just checked and found that the latest version of SRAdb released is
SRAdb_1.6.0 ? for R version 2.13.1. Hi, Malcolm. You probably will not need to update SRAdb package immediately. In order to address the questions you have below, it should suffice to update the database using getSRAdbFile().
Is there anything I can do to avail myself of you changes short of running
with development R/BioC (or putting rewrite rules in my proxy ;)?
Has NCBI acknowledged the issue you reported as being on their side?
They looked a few days later and did not find the problem. Upon updating our database locally, the problem appeared to be fixed.
I am faced again with this problem, on a different SRA study (this being the
2nd time I've wanted to use SRAdb).
Would you be able to confirm for me that using the XML from EBI fixes the
issue for the following study? (Of course, I understand if not)
I find that no rows are returned by ? ? ? ?sqliteQuickSQL(sra_con,'select * from study where study_accession =
"SRP004442"')
Using the SRAdb database file downloaded today and built on 2011-10-04, this query returns 1 row. Thanks for your patience, Malcolm. Sean
-----Original Message----- From: yuelin at gmail.com [mailto:yuelin at gmail.com] On Behalf Of Jack
Zhu
Sent: Tuesday, October 04, 2011 4:10 PM To: Sean Davis Cc: Cook, Malcolm; bioc-devel at r-project.org Subject: Re: [Bioc-devel] SRAdb missing runs Hi Malcolm, Recently one other user also found missing SRA records in the SRAdb database. ?I looked into the problem and ?it looks like the problems was with the xml files on the NCBI SRA ftp site. So I modified the package and switched the main downloading source of the SRA xml files to EBI. ?It seems working now. ?Please let me know if you still see any problems. ?Thanks. Jack On 19 September 2011 08:41, Sean Davis <sdavis2 at mail.nih.gov> wrote:
Hi, Malcolm. ?I submitted a ticket to SRA. ?They have assigned the ticket already. ?We'll keep you updated on the outcome as it definitely impacts the utilization of SRA by us (SRAdb) and others. Sean On Mon, Sep 19, 2011 at 8:25 AM, Cook, Malcolm <MEC at stowers.org>
wrote:
Jack, Thanks for the reply. I'm actually not that savvy about the internals of SRA and GEO at
NCBI. ?I've cobbled my first submission RNA-SEQ submission to GEO,
which in
turn submits to SRA. ?The reads in question are from modEnccode project which submits to GEO which submits to SRA. ?I've not tried to deconstruct
the
reason why some of these files have gone missing from the XML. ? Do you think this is something to report to modEncde, GEO, NCBI?
Cheers, Malcolm
________________________________________ From: yuelin at gmail.com [yuelin at gmail.com] On Behalf Of Jack Zhu
[zhujack at mail.nih.gov]
Sent: Friday, September 16, 2011 10:21 PM To: Cook, Malcolm Cc: bioc-devel at r-project.org; Sean Davis Subject: Re: [Bioc-devel] SRAdb missing runs Hi Malcolm, I am really sorry that I missed your post, but thank you very much for the report. I have reproduced the problem you found. ?I did a little bit study, it looks like the problem of missing runs in the SRAdb is caused by failure updating of the XML files by the NCBI. As you know all the data in the SRAdb is from NCBI SRA XML files, which are downloaded from the NCBI ftp site (ftp://ftp.ncbi.nih.gov/sra/Submissions/). ?As shown in this page, http://www.ncbi.nlm.nih.gov/sra/SRX032508, SRR07443 was
submitted
through SRA010243. Unfortunately the SRA010243 XML file on the
NCBI
ftp site ( ftp://ftp.ncbi.nih.gov/sra/Submissions/SRA010/SRA010243/) does not include SRR07443 and SRX032508, which is apparently a result of failure updating of the XML files when new runs/samples were
added.
Malcolm, currently we are looking into new mechanisms to update
SRAdb
and hopefully the problem will be fixed soon. Thanks again. Jack On 16 September 2011 07:06, Sean Davis <sdavis2 at mail.nih.gov>
wrote:
Sorry, Malcolm. We'll look into it. ?Thanks for the report. Sean On Wed, Sep 14, 2011 at 5:09 PM, Cook, Malcolm
<MEC at stowers.org>
wrote:
Hi Sean, Jack, and fellow SRAdb users, Sean, I failed to cc: you 1st time around. ?Perhaps you have a
suggestion for me....???
I remain perplexed as to why selected SRA runs fail to appear in
SRAdb.
Does anyone else have some experience/advice in this. Thanks much, ~Malcolm -----Original Message----- From: Cook, Malcolm Sent: Friday, September 09, 2011 4:15 PM To: 'bioc-devel at r-project.org'; 'zhujack at mail.nih.gov' Subject: SRAdb missing runs Hi Jack and other SRAdb users, I find at least one SRA run missing from the sqlite database obtained
from a fresh `getSRAdbFile()`
SRR074430 is present in the SRA
ewer&run=SRR074430
but directly querying the sqlite3 database fails to find it: sqlite3 -list SRAmetadb.sqlite "select study_accession,
submission_accession, sample_accession, experiment_accession,
run_accession, ?sample_alias from sra ?where run_accession in
('SRR031766','SRR031767','SRR074430')"
SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG1012
8_RNAi-1
SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG1012
8_RNAi-1
Can anyone advise me as the origin of this discrepancy, or perhaps
fix a
misunderstanding I may have in using this resource.
I just downloaded a fresh SRAdbFile... ?here is the "Metadata
associate
with downloaded file:"
c("schema version", "creation timestamp")c("1.0", "2011-09-03
10:38:16")
Below is a full transcript with SessionInfo(), if it helps. Thanks! Malcolm Cook Computational Biology - Stowers Institute for Medical Research
library('SRAdb')
sqlfile <- getSRAdbFile()
sqlfile <- getSRAdbFile() trying URL
Content type 'text/plain; charset=ISO-8859-1' length 38391904 bytes
(36.6 Mb)
opened URL
==================================================
downloaded 36.6 Mb
Unzipping...
Metadata associate with downloaded file:
c("schema version", "creation timestamp")c("1.0", "2011-09-03
10:38:16")
sessionInfo()
sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base other attached packages: [1] SRAdb_1.6.0 ? ?RCurl_1.5-0 ? ?bitops_1.0-4.1
graph_1.30.0 ? RSQLite_0.9-4
[6] DBI_0.2-5 loaded via a namespace (and not attached): [1] Biobase_2.12.2 ?GEOquery_2.19.2 XML_3.4-0 ? ? ? tools_2.13.1
q('no')
bash-3.2$ ? ?sqlite3 -list SRAmetadb.sqlite "select study_accession,
submission_accession, sample_accession, experiment_accession,
run_accession, ?sample_alias from sra ?where run_accession in
('SRR031766','SRR031767','SRR074430')"
?sqlite3 -list SRAmetadb.sqlite "select study_accession,
submission_accession, sample_accession, experiment_accession,
run_accession, ?sample_alias from sra ?where run_accession in
('SRR031766','SRR031767','SRR074430')"
SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG1012
8_RNAi-1
SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG1012
8_RNAi-1
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel