I have got the following error message when using RSQLite. So, what is
the size limit of a string? and what is the limit of the BLOB (by
defination, I assume it shoud be in the TB range at least)? Thanks! -Simon
# trying to insert a DNA sequence of about 500MB into a table
> query01<-dbSendQuery(connect, statement = paste ("
+ insert into dna values ('", seq0, "')", sep="")
+ )
Error in sqliteExecStatement(conn, statement, ...) :
RS-DBI driver: (error in statement: String or BLOB exceeded size limit)
+
[Bioc-devel] size limit of a string or a BLOB
4 messages · Simon Lin, Seth Falcon, Sean Davis
Hi Simon, This might be better directed to the R-sig-db list. But anyway... Simon Lin <simonlin at duke.edu> writes:
I have got the following error message when using RSQLite. So, what is the size limit of a string? and what is the limit of the BLOB (by defination, I assume it shoud be in the TB range at least)? Thanks! -Simon # trying to insert a DNA sequence of about 500MB into a table
> query01<-dbSendQuery(connect, statement = paste ("
+ insert into dna values ('", seq0, "')", sep="")
+ )
Error in sqliteExecStatement(conn, statement, ...) :
RS-DBI driver: (error in statement: String or BLOB exceeded size limit)
+
This is a Sqlite error message, that is, not something being generated
in RSQLite directly. The hard coded limit from the SQLite sources
is just undr 1GB. Here's the relevant SQLite code:
/*
** The maximum length of a TEXT or BLOB in bytes. This also
** limits the size of a row in a table or index.
**
** The hard limit is the ability of a 32-bit signed integer
** to count the size: 2^31-1 or 2147483647.
*/
#ifndef SQLITE_MAX_LENGTH
# define SQLITE_MAX_LENGTH 1000000000
#endif
How many columns does the dna table have? Perhaps there is a
different way to organize the data?
+ seth
Seth Falcon | seth at userprimary.net | blog: http://userprimary.net/user/
Simon Lin wrote:
I have got the following error message when using RSQLite. So, what is the size limit of a string? and what is the limit of the BLOB (by defination, I assume it shoud be in the TB range at least)? Thanks! -Simon # trying to insert a DNA sequence of about 500MB into a table
> query01<-dbSendQuery(connect, statement = paste ("
+ insert into dna values ('", seq0, "')", sep="")
+ )
Error in sqliteExecStatement(conn, statement, ...) :
RS-DBI driver: (error in statement: String or BLOB exceeded size limit)
+
Hi, Simon. I cannot comment directly on the RSQLite limits, but here is what SQLite says about limits: http://www.sqlite.org/limits.html Sean
Seth Falcon wrote:
Hi Simon, This might be better directed to the R-sig-db list. But anyway... Simon Lin <simonlin at duke.edu> writes:
I have got the following error message when using RSQLite. So, what is the size limit of a string? and what is the limit of the BLOB (by defination, I assume it shoud be in the TB range at least)? Thanks! -Simon # trying to insert a DNA sequence of about 500MB into a table
> query01<-dbSendQuery(connect, statement = paste ("
+ insert into dna values ('", seq0, "')", sep="")
+ )
Error in sqliteExecStatement(conn, statement, ...) :
RS-DBI driver: (error in statement: String or BLOB exceeded size limit)
+
This is a Sqlite error message, that is, not something being generated
in RSQLite directly. The hard coded limit from the SQLite sources
is just undr 1GB. Here's the relevant SQLite code:
/*
** The maximum length of a TEXT or BLOB in bytes. This also
** limits the size of a row in a table or index.
**
** The hard limit is the ability of a 32-bit signed integer
** to count the size: 2^31-1 or 2147483647.
*/
#ifndef SQLITE_MAX_LENGTH
# define SQLITE_MAX_LENGTH 1000000000
#endif
How many columns does the dna table have? Perhaps there is a
different way to organize the data?
You can look at how EnsEMBL does this in their DNA tables. Basically, they break the data into several chunks of, say 10Kb and store that along with a sequence id and a start and end. A middle layer is then necessary for doing inserts and selects to either break up the sequence or piece it back together. The advantage is that fast random access to regions of large sequences is very fast and memory efficient. Since you are in R, you could also look as using something like BioStrings in combination with RSQLite, though I'm not sure of the details of such a strategy. Sean