I'm relatively new to accessing databases from within R, and have
been having trouble with segmentation faults when using RSQLite.
My application requires a single table of doubles with 225 columns
and about 6M rows. The following session transcript mimics this
table, and consistently produces errors: sometimes an immediate seg
fault, and sometimes one of several odd but non-critical error
messages, an example of which appears below. (In the latter case, if
I try to rerun the bit that generated the error, then I usually get
the seg fault.)
Small numbers of retrievals always work fine (i.e., the single
"get_table()" call below, or looping only 50 or 100 times), while
large numbers lead to the errors. If I reduce the number of columns
in the table, I also seem to avoid the errors.
Some version info is given below, but please let me know if anything
else would be helpful. I didn't use the fancy options when installing
the RSQLite package, so as best I can tell, it's using the compiled
library that gets built in the sqlite/lib directory, inside the
package's own directory.
Any advice would be greatly appreciated.
- Richard
**** R session transcript
> version
_
platform x86_64-unknown-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 4.0
year 2006
month 10
day 03
svn rev 39566
language R
version.string R version 2.4.0 (2006-10-03)
> installed.packages()[installed.packages()[,"Package"] %in% c
("RSQLite","DBI"), c(1,3,10)]
Package Version Built
DBI "DBI" "0.1-11" "2.4.0"
RSQLite "RSQLite" "0.4-9" "2.4.0"
> library( RSQLite )
Loading required package: DBI
> drv <- SQLite()
> con <- dbConnect( drv, "testing.db" )
> table.name <- "testing"
> data <- as.data.frame( matrix( rnorm( 225000 ), ncol = 225 ) )
> dbWriteTable( con, "testing", data, row.names = F, overwrite = T )
[1] TRUE
> get_data <- function( indices, con, table_name ) {
+ query <- paste(
+ "select * from",
+ table_name,
+ "where _ROWID_ in (",
+ paste( indices, collapse = "," ),
+ ")"
+ )
+ dbGetQuery( con, query )
+ }
> get_data( 1:3, con, table.name )[,1:5]
V1 V2 V3 V4 V5
1 -1.8803772 1.00073121 -0.3496548 1.027102 -0.7882639
2 -0.9140920 -0.02242254 -1.4682685 0.375063 -0.6137175
3 0.4243143 -0.15502231 1.6846590 1.210313 -0.4726484
> # No error with just one interation
> temp <- lapply( 1:50, function(i) get_data( 1:3, con, table.name ) )
> # No error with 50 iterations
> temp <- lapply( 1:2000, function(i) get_data( 1:3, con,
table.name ) )
Error in sqliteResultInfo(dbObj, ...) : SET_STRING_ELT() can only be
applied to a 'character vector', not a 'NULL'
> # Rerunning now typically generates a segmentation fault
> temp <- lapply( 1:2000, function(i) get_data( 1:3, con,
table.name ) )
*** caught segfault ***
address 0x2, cause 'memory not mapped'
Traceback:
1: sqliteResultInfo(dbObj, ...)
2: dbGetInfo(res, "completed")
3: dbGetInfo(res, "completed")
4: .class1(object)
5: .class1(object)
6: is(object, Cl)
7: .valueClassTest(standardGeneric("dbHasCompleted"), "logical",
"dbHasCompleted")
8: dbHasCompleted(rs)
9: sqliteQuickSQL(conn, statement, ...)
10: dbGetQuery(con, query)
11: dbGetQuery(con, query)
12: get_data(1:3, con, table.name)
13: FUN(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, <truncated>, 1998, 1999,
2000)[[152]], ...)
14: lapply(1:2000, function(i) get_data(1:3, con, table.name))
Possible actions:
1: abort (with core dump)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 3
Process R exited abnormally with code 70 at Mon Nov 6 12:22:14 2006
Segmentation faults using RSQLite
4 messages · Richard Bourgon, Seth Falcon
Hi Richard, Richard Bourgon <bourgon at ebi.ac.uk> writes:
My application requires a single table of doubles with 225 columns and about 6M rows. The following session transcript mimics this table, and consistently produces errors: sometimes an immediate seg fault, and sometimes one of several odd but non-critical error messages, an example of which appears below. (In the latter case, if I try to rerun the bit that generated the error, then I usually get the seg fault.) Small numbers of retrievals always work fine (i.e., the single "get_table()" call below, or looping only 50 or 100 times), while large numbers lead to the errors. If I reduce the number of columns in the table, I also seem to avoid the errors. Some version info is given below, but please let me know if anything else would be helpful. I didn't use the fancy options when installing the RSQLite package, so as best I can tell, it's using the compiled library that gets built in the sqlite/lib directory, inside the package's own directory. Any advice would be greatly appreciated.
You've identified a bug. Thanks for the sample code. We'll take a closer look and try to have a fix available in a few days. I've also noticed that in the patched version of RSQLite (see recent message to this list), dbWriteTable is broken. + seth
Richard Bourgon <bourgon at ebi.ac.uk> writes:
My application requires a single table of doubles with 225 columns and about 6M rows. The following session transcript mimics this table, and consistently produces errors: sometimes an immediate seg fault, and sometimes one of several odd but non-critical error messages, an example of which appears below. (In the latter case, if I try to rerun the bit that generated the error, then I usually get the seg fault.)
I believe that I've found and fixed the bug causing the segfault you reported. I've posted an updated version, RSQLite 0.4-13, here: http://www.bioconductor.org/packages/misc/ Could you give it a try and confirm that it works for you? Thanks again for the bug report. Best, + seth
Seth, The updated version runs without errors. Thanks for the quick repair! - Richard
On 7 Nov 2006, at 07:12, Seth Falcon wrote:
Richard Bourgon <bourgon at ebi.ac.uk> writes:
My application requires a single table of doubles with 225 columns and about 6M rows. The following session transcript mimics this table, and consistently produces errors: sometimes an immediate seg fault, and sometimes one of several odd but non-critical error messages, an example of which appears below. (In the latter case, if I try to rerun the bit that generated the error, then I usually get the seg fault.)
I believe that I've found and fixed the bug causing the segfault you reported. I've posted an updated version, RSQLite 0.4-13, here: http://www.bioconductor.org/packages/misc/ Could you give it a try and confirm that it works for you? Thanks again for the bug report. Best, + seth