Skip to content
Prev 12658 / 21312 Next

[Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

On 01/19/2018 02:24 PM, Ludwig Geistlinger wrote:
My guess is that the database is being accessed by multiple processes 
simultaneously and, even though the data bases are opened read-only, 
this causes a corruption in the access of some sort. You can avoid 
multiple processes accessing the database at the same time by using a 'lock'

getSymbols <- function ( anno.pkg, id )
{
     nmspc <- loadNamespace(anno.pkg)
     anno.pkg <- get(anno.pkg, nmspc)

     BiocParallel::ipclock(id)
     syms <- suppressMessages({
         AnnotationDbi::mapIds(
             anno.pkg, keys=keys(anno.pkg), keytype="PROBEID",
             column="ENTREZID"
         )
     })
     BiocParallel::ipcunlock(id)

     length(syms)
}

x <- bplapply(pkgs , getSymbols, ipcid())

There are two additional considerations here.

The first is that one wants to worry about the amount of data transfered 
between worker and manager compared to the amount of time spent in 
computation. So in your previous formulation you sent back all the 
symbols -- this will be relatively expensive compared to the amount of 
work done in the function (reading the ids from the database), and you 
would rather do more work and transmit less (both to and from the 
worker) in each call to getSymbol().

The second is similar, but from the lock perspective -- since the lock 
imposes essential serial evaluation through that portion of the code, 
you'd like the locked portion of the worker's task to be just a small 
portion of the total work done by the worker.

I guess a more clever use of locks would be one per data base (generate 
two ipcid()'s in the manager, and pass these to the worker in such a way 
that the worker uses the same lock for each database.

Martin
This email message may contain legally privileged and/or...{{dropped:2}}