Skip to content

[Bioc-devel] size limit of a string or a BLOB

4 messages · Simon Lin, Seth Falcon, Sean Davis

#
I have got the following error message when using RSQLite. So, what is 
the size limit of a string? and what is the limit of the BLOB (by 
defination, I assume it shoud be in the TB range at least)? Thanks! -Simon

# trying to insert a DNA sequence of about 500MB into a table

 > query01<-dbSendQuery(connect, statement = paste ("
+     insert into dna values ('", seq0, "')", sep="")
+ )

Error in sqliteExecStatement(conn, statement, ...) :
  RS-DBI driver: (error in statement: String or BLOB exceeded size limit)
+
#
Hi Simon,

This might be better directed to the R-sig-db list.  But anyway...

Simon Lin <simonlin at duke.edu> writes:
This is a Sqlite error message, that is, not something being generated
in RSQLite directly.  The hard coded limit from the SQLite sources
is just undr 1GB.  Here's the relevant SQLite code:

    /*
    ** The maximum length of a TEXT or BLOB in bytes.   This also
    ** limits the size of a row in a table or index.
    **
    ** The hard limit is the ability of a 32-bit signed integer
    ** to count the size: 2^31-1 or 2147483647.
    */
    #ifndef SQLITE_MAX_LENGTH
    # define SQLITE_MAX_LENGTH 1000000000
    #endif

How many columns does the dna table have?  Perhaps there is a
different way to organize the data?

+ seth
#
Simon Lin wrote:
Hi, Simon.  I cannot comment directly on the RSQLite limits, but here is 
what SQLite says about limits:

http://www.sqlite.org/limits.html

Sean
#
Seth Falcon wrote:
You can look at how EnsEMBL does this in their DNA tables.  Basically, 
they break the data into several chunks of, say 10Kb and store that 
along with a sequence id and a start and end.  A middle layer is then 
necessary for doing inserts and selects to either break up the sequence 
or piece it back together.  The advantage is that fast random access to 
regions of large sequences is very fast and memory efficient.

Since you are in R, you could also look as using something like 
BioStrings in combination with RSQLite, though I'm not sure of the 
details of such a strategy.

Sean