Skip to content

concurrent reading/writing in "chunks" with RSQLite (need some help troubleshooting)

4 messages · Jonathan Greenberg, Seth Falcon, Sean Davis +1 more

#
Keith and Seth (and r-sig-db'ers):

I somewhat understand the difference between both of your solutions --
Seth, I'm not a database person (quickly learning tho) but does your
code hold the output in memory before writing it all at the end (via
dbCommit), or is each iteration's output being stored in some "waiting
room" (on disk) where it gets written at the very end?  The key (for
my purposes) is to avoid keeping more than a chunk's worth of input
AND output in memory at any one time.

Along the lines of Keith's solution, I noticed the COUNT query can be
VERY slow -- there is no rapid way of getting the number of rows in a
database table?

Thanks again!

--j
On Tue, Jun 29, 2010 at 9:50 PM, kMan <kchamberln at gmail.com> wrote:
#
Jonathan,

On Mon, Jul 5, 2010 at 12:36 PM, Jonathan Greenberg
<greenberg at ucdavis.edu> wrote:
It seems at this point that your simplified example is no longer
sufficient for understanding what you want to achieve.  I'm pretty
sure that data within a transaction will be written to disk (for a
disk-based db) before commit is called.  This is something that will
be determined by SQLite itself -- you can go read up on how things
work.  Or you can do some experiments with your data and see how it
looks.

Depending on the manipulations you actually want to perform before
inserting the data, you might be able to keep everything in the db.
You could use SQLite extension functions (see RSQLite.extfuns) to
expand what is possible at the SQL level
No, there is no notion of number of rows in a table that I am aware
of.  Note that COUNT is a way of aggregating the number of rows
returned from a query and in general there is no way for the db to
know how many rows will match a query.

I expect Keith's solution to be quite slow.  It is executing a
separate query each time through the loop and making a linear scan
through the table to do the offset.

+ seth
#
On Mon, Jul 5, 2010 at 4:09 PM, Seth Falcon <seth at userprimary.net> wrote:

            
One way of dealing with this situation using RDBMs is to use a set of
triggers (an insert and delete pair) to keep the count in a separate table.
 I'm sure there are some examples/discussion online for creating triggers in
SQLite.

Sean

  
  
#
Dear Jonathan,

I do not know of a quick way to query the number of records, the first time.
After that, it should be a simpler matter. I suggest looking into "CREATE
INDEX" as part of your dbGetQuery() call. With clever use of indexes you may
work around needing to create duplicate data in some cases. 

I have not actually used it for this particular case yet, so I do not have a
working example. 

Sincerely,
KeithC.


-----Original Message-----
From: jgrn307 at gmail.com [mailto:jgrn307 at gmail.com] On Behalf Of Jonathan
Greenberg
Sent: Monday, July 05, 2010 1:37 PM
To: kMan
Cc: Seth Falcon; r-sig-db at stat.math.ethz.ch
Subject: Re: concurrent reading/writing in "chunks" with RSQLite (need some
help troubleshooting)

Keith and Seth (and r-sig-db'ers):

I somewhat understand the difference between both of your solutions -- Seth,
I'm not a database person (quickly learning tho) but does your code hold the
output in memory before writing it all at the end (via dbCommit), or is each
iteration's output being stored in some "waiting room" (on disk) where it
gets written at the very end?  The key (for my purposes) is to avoid keeping
more than a chunk's worth of input AND output in memory at any one time.

Along the lines of Keith's solution, I noticed the COUNT query can be VERY
slow -- there is no rapid way of getting the number of rows in a database
table?

Thanks again!

--j
On Tue, Jun 29, 2010 at 9:50 PM, kMan <kchamberln at gmail.com> wrote:
would not be surprised if I missed a goal implied by the code style used or
off-list correspondence.
through the loop required (2) queries, a read and a write, where (I've
assumed) the result sets are handled one at a time anyway. The code I
proposed does the same thing, but overall there is (1) additional query
prior to the loop to get the number of rows. The rest is just case
processing b/c dbGetQuery() seemed to do funny things when LIMIT extended
beyond the end of the table, and it wouldn't remember its last read location
(e.g. didn't work like scan()). In addition, it works with dbWriteTable()
and gets around the need for creating the table first, and filling it in
later.
particular record. The fetch() & fetch_n, were those intended to get the
read queries to start at the last read location (e.g. like serial reads
using scan())? Eliminating need to condition match or generate a db index
first would speed things up, and I'd very much like to learn that trick. I'd
think it would also make for an excellent R-wiki post.
happened with your code.