Skip to content
Back to formatted view

Raw Message

Message-ID: <YV3Nc2Be5rl6EzwD@posteo.no>
Date: 2021-10-06T16:23:15Z
From: Rasmus Liland
Subject: RSQLite slowness

Dear r-help readers,

why is it so much slower to query an 
sqlite database using RSQlite ?from the 
outside? using param like 

	statement <-
	 "SELECT * FROM gene2refseq
	  LEFT JOIN gene_info ON
	    gene_info.GeneID = gene2refseq.GeneID
	  WHERE gene2refseq.`RNA_nucleotide_accession.version` 
	    LIKE ?"
	db <- "gene_info.sqlite"
	conn <- DBI::dbConnect(RSQLite::SQLite(), db)
	x1 <- DBI::dbGetQuery(conn=conn, 
	  statement=statement, 
	  param=list(H?kan20210914$RNANucleotideAccession))

compared to querying ?from the inside? 
of sqlite, by writing your search terms 
as a table first, and then calling it 

	statement <- 
	 "SELECT * FROM H
	  LEFT JOIN gene2refseq R ON
	    R.`RNA_nucleotide_accession.version` 
	    LIKE '%' || H.RNANucleotideAccession || '%'
	  LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
	DBI::dbWriteTable(conn, "H", H?kan20210914)
	x2 <- DBI::dbGetQuery(conn=conn, statement=statement)
	DBI::dbDisconnect(conn)

On my system (E5-2603 v4), the first 
query took more than an hour, while the 
second took only a few minutes ...  

Do you guys know of any faster (but also 
nice) way to dig around in very large 
tsv files like 
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz 
and 
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz 
?

Best,
Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20211006/671429f1/attachment.sig>