Hi, Within a package I am developing, I would like to enable parallel probe to gene mapping for a compendium of microarray datasets. This accordingly makes use of annotation packages such as hgu133a.db, which in turn connect to the SQLite database via AnnotationDbi. When running in multi-core mode (i.e. using a MulticoreParam with BiocParallel) using more than 2 cores, this causes the error: database disk image is malformed In a very similar problem: https://support.bioconductor.org/p/38541/ Adi Tarca and Dan Tenenbaum identified and resolved this problem by ensuring that each process has its own unique database connection, i.e. AnnotationDbi is not loaded before sending the job to the workers. This solution was easily realized as this analysis was carried out within a script and not a package. However, within my package, AnnotationDbi is loaded as a dependency of my package's imports. How to resolve this here? I am not sure whether I perfectly understand the underlying mechanisms, but is there a way to make my workers load their own version of AnnotationDbi instead of using the one of the parent process? Or am I supposed to unload all packages depending on AnnotationDbi, and AnnotationDbi itself, before sending the job to the workers (and reload all of them after the job has finished?) Thanks a lot, Ludwig -- Dr. Ludwig Geistlinger CUNY School of Public Health
[Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
8 messages · Ludwig Geistlinger, Vincent Carey, Gabriel Becker +1 more
good question some of the discussion on http://sqlite.1065341.n5.nabble.com/Parallel-access-to-read-only-in-memory-database-td91814.html seems relevant. converting the relatively small annotation package content to pure R read-only tables on the master before parallelizing might be very simple? On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
Ludwig.Geistlinger at sph.cuny.edu> wrote:
Hi, Within a package I am developing, I would like to enable parallel probe to gene mapping for a compendium of microarray datasets. This accordingly makes use of annotation packages such as hgu133a.db, which in turn connect to the SQLite database via AnnotationDbi. When running in multi-core mode (i.e. using a MulticoreParam with BiocParallel) using more than 2 cores, this causes the error: database disk image is malformed In a very similar problem: https://support.bioconductor.org/p/38541/ Adi Tarca and Dan Tenenbaum identified and resolved this problem by ensuring that each process has its own unique database connection, i.e. AnnotationDbi is not loaded before sending the job to the workers. This solution was easily realized as this analysis was carried out within a script and not a package. However, within my package, AnnotationDbi is loaded as a dependency of my package's imports. How to resolve this here? I am not sure whether I perfectly understand the underlying mechanisms, but is there a way to make my workers load their own version of AnnotationDbi instead of using the one of the parent process? Or am I supposed to unload all packages depending on AnnotationDbi, and AnnotationDbi itself, before sending the job to the workers (and reload all of them after the job has finished?) Thanks a lot, Ludwig -- Dr. Ludwig Geistlinger CUNY School of Public Health [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
IT seems like you could also force a copy of the reference object via <dbobject>$copy() and then force a refresh of the conn slot by assigning a new db connection into it. I'm having trouble confirming that this would work, however, because I actually can't reproduce the error. The naive way works for me on my mac laptop (which is running an old R and Bioconductor) and on the linux cluster I have access to (running Bioc 3.6): (cluster)
getSymbol <- function ( x ) {
+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) ) + }
x <- list( "36090_at" , "38785_at" )
mclapply( x , getSymbol )
[[1]] [[1]]$`36090_at` [1] "TBL2" [[2]] [[2]]$`38785_at` [1] "MUC1"
sessionInfo()
R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) Matrix products: default BLAS: /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so LAPACK: /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] hgu95av2.db_3.2.3 org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0 [4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0 [7] BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.14 digest_0.6.14 DBI_0.7 RSQLite_2.0 [5] pillar_1.1.0 rlang_0.1.6 blob_1.1.0 bit64_0.9-8 [9] bit_1.1-13 compiler_3.4.3 pkgconfig_2.0.1 memoise_1.1.0 [13] tibble_1.4.1
~G On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
good question some of the discussion on http://sqlite.1065341.n5.nabble.com/Parallel-access-to- read-only-in-memory-database-td91814.html seems relevant. converting the relatively small annotation package content to pure R read-only tables on the master before parallelizing might be very simple? On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger < Ludwig.Geistlinger at sph.cuny.edu> wrote:
Hi, Within a package I am developing, I would like to enable parallel probe
to
gene mapping for a compendium of microarray datasets. This accordingly makes use of annotation packages such as hgu133a.db, which in turn connect to the SQLite database via AnnotationDbi. When running in multi-core mode (i.e. using a MulticoreParam with BiocParallel) using more than 2 cores, this causes the error: database disk image is malformed In a very similar problem: https://support.bioconductor.org/p/38541/ Adi Tarca and Dan Tenenbaum identified and resolved this problem by ensuring that each process has its own unique database connection, i.e. AnnotationDbi is not loaded before sending the job to the workers. This solution was easily realized as this analysis was carried out within a script and not a package. However, within my package, AnnotationDbi is loaded as a dependency of my package's imports. How to resolve this here? I am not sure whether I perfectly understand the underlying mechanisms, but is there a way to make my workers load their own version of AnnotationDbi instead of using the one of the parent process? Or am I supposed to unload all packages depending on AnnotationDbi, and AnnotationDbi itself, before sending the job to the workers (and reload
all
of them after the job has finished?)
Thanks a lot,
Ludwig
--
Dr. Ludwig Geistlinger
CUNY School of Public Health
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]]
On 01/19/2018 12:23 PM, Vincent Carey wrote:
good question some of the discussion on http://sqlite.1065341.n5.nabble.com/Parallel-access-to-read-only-in-memory-database-td91814.html seems relevant. converting the relatively small annotation package content to pure R read-only tables on the master before parallelizing might be very simple? On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger < Ludwig.Geistlinger at sph.cuny.edu> wrote:
Hi, Within a package I am developing, I would like to enable parallel probe to gene mapping for a compendium of microarray datasets. This accordingly makes use of annotation packages such as hgu133a.db, which in turn connect to the SQLite database via AnnotationDbi. When running in multi-core mode (i.e. using a MulticoreParam with BiocParallel) using more than 2 cores, this causes the error: database disk image is malformed In a very similar problem: https://support.bioconductor.org/p/38541/ Adi Tarca and Dan Tenenbaum identified and resolved this problem by ensuring that each process has its own unique database connection, i.e. AnnotationDbi is not loaded before sending the job to the workers. This solution was easily realized as this analysis was carried out within a script and not a package. However, within my package, AnnotationDbi is loaded as a dependency of my package's imports. How to resolve this here?
Can you be a little more specific here? The problem isn't likely with AnnotationDbi per se, but with the annotation package you use. Also, the connection on the worker is bad, but could be re-created (using, e.g., dbfile(org.Hs.eg.db)...) but probably a toy example would help. Martin
I am not sure whether I perfectly understand the underlying mechanisms,
but is there a way to make my workers load their own version of
AnnotationDbi instead of using the one of the parent process?
Or am I supposed to unload all packages depending on AnnotationDbi, and
AnnotationDbi itself, before sending the job to the workers (and reload all
of them after the job has finished?)
Thanks a lot,
Ludwig
--
Dr. Ludwig Geistlinger
CUNY School of Public Health
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
On 01/19/2018 12:37 PM, Gabe Becker wrote:
IT seems like you could also force a copy of the reference object via <dbobject>$copy() and then force a refresh of the conn slot by assigning a new db connection into it. I'm having trouble confirming that this would work, however, because I actually can't reproduce the error. The naive way works for me on my mac laptop (which is running an old R and Bioconductor) and on the linux cluster I have access to (running Bioc 3.6): (cluster)
getSymbol <- function ( x ) {
+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) ) + }
pass the data base connection to the function
getSymbol <- function ( x, db )
## olde schoole
AnnotationDbi::mget(x, db)
## AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
and arrange for the general case, i.e., distinct processes with data
serialized between them
> cl = parallel::makePSOCKcluster(2)
> parLapply(cl, x, getSymbol, hgu95av2SYMBOL)
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: external pointer is not valid
(getSymbol would fail as originally written in the serial case, since
the workers would not have access to hgu95av2SYMBOL
The workaround is to open the connection on the node, e.g.,
getSymbol <- function ( x, dbname ) {
nmspc <- loadNamespace(dbname)
db <- get(dbname, nmspc)
AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
}
lapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db", BPPARAM = SnowParam())
Martin
x <- list( "36090_at" , "38785_at" )
mclapply( x , getSymbol )
[[1]] [[1]]$`36090_at` [1] "TBL2" [[2]] [[2]]$`38785_at` [1] "MUC1"
sessionInfo()
R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) Matrix products: default BLAS: /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so LAPACK: /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] hgu95av2.db_3.2.3 org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0 [4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0 [7] BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.14 digest_0.6.14 DBI_0.7 RSQLite_2.0 [5] pillar_1.1.0 rlang_0.1.6 blob_1.1.0 bit64_0.9-8 [9] bit_1.1-13 compiler_3.4.3 pkgconfig_2.0.1 memoise_1.1.0 [13] tibble_1.4.1
~G On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
good question some of the discussion on http://sqlite.1065341.n5.nabble.com/Parallel-access-to- read-only-in-memory-database-td91814.html seems relevant. converting the relatively small annotation package content to pure R read-only tables on the master before parallelizing might be very simple? On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger < Ludwig.Geistlinger at sph.cuny.edu> wrote:
Hi, Within a package I am developing, I would like to enable parallel probe
to
gene mapping for a compendium of microarray datasets. This accordingly makes use of annotation packages such as hgu133a.db, which in turn connect to the SQLite database via AnnotationDbi. When running in multi-core mode (i.e. using a MulticoreParam with BiocParallel) using more than 2 cores, this causes the error: database disk image is malformed In a very similar problem: https://support.bioconductor.org/p/38541/ Adi Tarca and Dan Tenenbaum identified and resolved this problem by ensuring that each process has its own unique database connection, i.e. AnnotationDbi is not loaded before sending the job to the workers. This solution was easily realized as this analysis was carried out within a script and not a package. However, within my package, AnnotationDbi is loaded as a dependency of my package's imports. How to resolve this here? I am not sure whether I perfectly understand the underlying mechanisms, but is there a way to make my workers load their own version of AnnotationDbi instead of using the one of the parent process? Or am I supposed to unload all packages depending on AnnotationDbi, and AnnotationDbi itself, before sending the job to the workers (and reload
all
of them after the job has finished?)
Thanks a lot,
Ludwig
--
Dr. Ludwig Geistlinger
CUNY School of Public Health
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
I apologize if I haven't been specific enough - however, I am also having trouble to reliably reproduce the error.
It does not seem to be exclusively related to the combination of AnnotationDbi and parallel computation, but also with some other packages I load.
While still trying to produce a minimal reproducible example, here is what returns the error quite reliably on an 8-core Linux machine:
# loading some dependencies of my package
library(org.Hs.eg.db)
library(pathview)
library(graph)
library(BiocParallel)
# annotation packages for the datasets in the compendium
pkgs <- rep(c("hgu133plus2.db","hgu133a.db"), 42)
#
getSymbols <- function ( anno.pkg )
{
require(anno.pkg, character.only=TRUE)
anno.pkg <- get(anno.pkg)
syms <- AnnotationDbi::mapIds(anno.pkg, keys=keys(anno.pkg),
keytype="PROBEID", column="ENTREZID")
return(syms)
}
x <- bplapply( pkgs , getSymbols ) ### sometimes I have to run this 2 or 3 times in a row to produce this error
Loading required package: hgu133plus2.db Error: BiocParallel errors element index: 29, 30, 31, 32, 33, 34, ... first error: database disk image is malformed
sessionInfo()
R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: SUSE Linux Enterprise Desktop 12 SP3 Matrix products: default BLAS: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRblas.so LAPACK: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BiocParallel_1.12.0 graph_1.56.0 pathview_1.18.0 [4] org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0 IRanges_2.12.0 [7] S4Vectors_0.16.0 Biobase_2.38.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.14 KEGGgraph_1.38.1 XVector_0.18.0 zlibbioc_1.24.0 [5] bit_1.1-12 R6_2.2.2 rlang_0.1.6 blob_1.1.0 [9] httr_1.3.1 tools_3.4.1 grid_3.4.1 png_0.1-7 [13] DBI_0.7 bit64_0.9-7 digest_0.6.13 tibble_1.4.1 [17] Rgraphviz_2.22.0 KEGGREST_1.18.0 memoise_1.1.0 RSQLite_2.0 [21] compiler_3.4.1 pillar_1.1.0 Biostrings_2.46.0 XML_3.98-1.9 [25] pkgconfig_2.0.1 -- Dr. Ludwig Geistlinger CUNY School of Public Health
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Martin Morgan <martin.morgan at roswellpark.org>
Sent: Friday, January 19, 2018 1:54 PM
To: Gabe Becker; Vincent Carey
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
Sent: Friday, January 19, 2018 1:54 PM
To: Gabe Becker; Vincent Carey
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
On 01/19/2018 12:37 PM, Gabe Becker wrote:
> IT seems like you could also force a copy of the reference object via
> <dbobject>$copy() and then force a refresh of the conn slot by assigning a
> new db connection into it.
>
> I'm having trouble confirming that this would work, however, because I
> actually can't reproduce the error. The naive way works for me on my mac
> laptop (which is running an old R and Bioconductor) and on the linux
> cluster I have access to (running Bioc 3.6):
>
>
> (cluster)
>
>> getSymbol <- function ( x ) {
>
> + return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )
>
> + }
pass the data base connection to the function
getSymbol <- function ( x, db )
## olde schoole
AnnotationDbi::mget(x, db)
## AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
and arrange for the general case, i.e., distinct processes with data
serialized between them
> cl = parallel::makePSOCKcluster(2)
> parLapply(cl, x, getSymbol, hgu95av2SYMBOL)
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: external pointer is not valid
(getSymbol would fail as originally written in the serial case, since
the workers would not have access to hgu95av2SYMBOL
The workaround is to open the connection on the node, e.g.,
getSymbol <- function ( x, dbname ) {
nmspc <- loadNamespace(dbname)
db <- get(dbname, nmspc)
AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
}
lapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db", BPPARAM = SnowParam())
Martin
>>
>
>> x <- list( "36090_at" , "38785_at" )
>
>>
>
>> mclapply( x , getSymbol )
>
> [[1]]
>
> [[1]]$`36090_at`
>
> [1] "TBL2"
>
>
>
> [[2]]
>
> [[2]]$`38785_at`
>
> [1] "MUC1"
>
>
>
>>
>
>> sessionInfo()
>
> R version 3.4.3 (2017-11-30)
>
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>
>
> Matrix products: default
>
> BLAS:
> /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so
>
> LAPACK:
> /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so
>
>
> locale:
>
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>
> [9] LC_ADDRESS=C LC_TELEPHONE=C
>
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>
> attached base packages:
>
> [1] stats4 parallel stats graphics grDevices utils datasets
>
> [8] methods base
>
>
> other attached packages:
>
> [1] hgu95av2.db_3.2.3 org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0
>
> [4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
>
> [7] BiocGenerics_0.24.0
>
>
> loaded via a namespace (and not attached):
>
> [1] Rcpp_0.12.14 digest_0.6.14 DBI_0.7 RSQLite_2.0
>
> [5] pillar_1.1.0 rlang_0.1.6 blob_1.1.0 bit64_0.9-8
>
> [9] bit_1.1-13 compiler_3.4.3 pkgconfig_2.0.1 memoise_1.1.0
>
> [13] tibble_1.4.1
>
>>
>
>
> ~G
>
> On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <stvjc at channing.harvard.edu>
> wrote:
>
>> good question
>>
>> some of the discussion on
>>
>> http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
>> read-only-in-memory-database-td91814.html
>>
>> seems relevant.
>>
>> converting the relatively small annotation package content to pure R
>> read-only tables on the master before parallelizing
>> might be very simple?
>>
>> On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
>> Ludwig.Geistlinger at sph.cuny.edu> wrote:
>>
>>> Hi,
>>>
>>> Within a package I am developing, I would like to enable parallel probe
>> to
>>> gene mapping for a compendium of microarray datasets.
>>>
>>> This accordingly makes use of annotation packages such as hgu133a.db,
>>> which in turn connect to the SQLite database via AnnotationDbi.
>>>
>>> When running in multi-core mode (i.e. using a MulticoreParam with
>>> BiocParallel) using more than 2 cores, this causes the error:
>>>
>>> database disk image is malformed
>>>
>>>
>>> In a very similar problem:
>>>
>>> https://support.bioconductor.org/p/38541/
>>>
>>> Adi Tarca and Dan Tenenbaum identified and resolved this problem by
>>> ensuring that each process has its own unique database connection, i.e.
>>> AnnotationDbi is not loaded before sending the job to the workers.
>>>
>>> This solution was easily realized as this analysis was carried out within
>>> a script and not a package.
>>>
>>> However, within my package, AnnotationDbi is loaded as a dependency of my
>>> package's imports.
>>>
>>> How to resolve this here?
>>> I am not sure whether I perfectly understand the underlying mechanisms,
>>> but is there a way to make my workers load their own version of
>>> AnnotationDbi instead of using the one of the parent process?
>>> Or am I supposed to unload all packages depending on AnnotationDbi, and
>>> AnnotationDbi itself, before sending the job to the workers (and reload
>> all
>>> of them after the job has finished?)
>>>
>>> Thanks a lot,
>>> Ludwig
>>>
>>>
>>>
>>> --
>>> Dr. Ludwig Geistlinger
>>> CUNY School of Public Health
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
>
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 01/19/2018 02:24 PM, Ludwig Geistlinger wrote:
I apologize if I haven't been specific enough - however, I am also having trouble to reliably reproduce the error.
It does not seem to be exclusively related to the combination of AnnotationDbi and parallel computation, but also with some other packages I load.
While still trying to produce a minimal reproducible example, here is what returns the error quite reliably on an 8-core Linux machine:
# loading some dependencies of my package
library(org.Hs.eg.db)
library(pathview)
library(graph)
library(BiocParallel)
# annotation packages for the datasets in the compendium
pkgs <- rep(c("hgu133plus2.db","hgu133a.db"), 42)
#
getSymbols <- function ( anno.pkg )
{
require(anno.pkg, character.only=TRUE)
anno.pkg <- get(anno.pkg)
syms <- AnnotationDbi::mapIds(anno.pkg, keys=keys(anno.pkg),
keytype="PROBEID", column="ENTREZID")
return(syms)
}
x <- bplapply( pkgs , getSymbols ) ### sometimes I have to run this 2 or 3 times in a row to produce this error
Loading required package: hgu133plus2.db Error: BiocParallel errors element index: 29, 30, 31, 32, 33, 34, ... first error: database disk image is malformed
My guess is that the database is being accessed by multiple processes
simultaneously and, even though the data bases are opened read-only,
this causes a corruption in the access of some sort. You can avoid
multiple processes accessing the database at the same time by using a 'lock'
getSymbols <- function ( anno.pkg, id )
{
nmspc <- loadNamespace(anno.pkg)
anno.pkg <- get(anno.pkg, nmspc)
BiocParallel::ipclock(id)
syms <- suppressMessages({
AnnotationDbi::mapIds(
anno.pkg, keys=keys(anno.pkg), keytype="PROBEID",
column="ENTREZID"
)
})
BiocParallel::ipcunlock(id)
length(syms)
}
x <- bplapply(pkgs , getSymbols, ipcid())
There are two additional considerations here.
The first is that one wants to worry about the amount of data transfered
between worker and manager compared to the amount of time spent in
computation. So in your previous formulation you sent back all the
symbols -- this will be relatively expensive compared to the amount of
work done in the function (reading the ids from the database), and you
would rather do more work and transmit less (both to and from the
worker) in each call to getSymbol().
The second is similar, but from the lock perspective -- since the lock
imposes essential serial evaluation through that portion of the code,
you'd like the locked portion of the worker's task to be just a small
portion of the total work done by the worker.
I guess a more clever use of locks would be one per data base (generate
two ipcid()'s in the manager, and pass these to the worker in such a way
that the worker uses the same lock for each database.
Martin
sessionInfo()
R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: SUSE Linux Enterprise Desktop 12 SP3 Matrix products: default BLAS: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRblas.so LAPACK: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BiocParallel_1.12.0 graph_1.56.0 pathview_1.18.0 [4] org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0 IRanges_2.12.0 [7] S4Vectors_0.16.0 Biobase_2.38.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.14 KEGGgraph_1.38.1 XVector_0.18.0 zlibbioc_1.24.0 [5] bit_1.1-12 R6_2.2.2 rlang_0.1.6 blob_1.1.0 [9] httr_1.3.1 tools_3.4.1 grid_3.4.1 png_0.1-7 [13] DBI_0.7 bit64_0.9-7 digest_0.6.13 tibble_1.4.1 [17] Rgraphviz_2.22.0 KEGGREST_1.18.0 memoise_1.1.0 RSQLite_2.0 [21] compiler_3.4.1 pillar_1.1.0 Biostrings_2.46.0 XML_3.98-1.9 [25] pkgconfig_2.0.1 -- Dr. Ludwig Geistlinger CUNY School of Public Health
________________________________________
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Martin Morgan <martin.morgan at roswellpark.org>
Sent: Friday, January 19, 2018 1:54 PM
To: Gabe Becker; Vincent Carey
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
On 01/19/2018 12:37 PM, Gabe Becker wrote:
IT seems like you could also force a copy of the reference object via
<dbobject>$copy() and then force a refresh of the conn slot by assigning a
new db connection into it.
I'm having trouble confirming that this would work, however, because I
actually can't reproduce the error. The naive way works for me on my mac
laptop (which is running an old R and Bioconductor) and on the linux
cluster I have access to (running Bioc 3.6):
(cluster)
getSymbol <- function ( x ) {
+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )
+ }
pass the data base connection to the function
getSymbol <- function ( x, db )
## olde schoole
AnnotationDbi::mget(x, db)
## AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
and arrange for the general case, i.e., distinct processes with data
serialized between them
> cl = parallel::makePSOCKcluster(2)
> parLapply(cl, x, getSymbol, hgu95av2SYMBOL)
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: external pointer is not valid
(getSymbol would fail as originally written in the serial case, since
the workers would not have access to hgu95av2SYMBOL
The workaround is to open the connection on the node, e.g.,
getSymbol <- function ( x, dbname ) {
nmspc <- loadNamespace(dbname)
db <- get(dbname, nmspc)
AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
}
lapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db", BPPARAM = SnowParam())
Martin
x <- list( "36090_at" , "38785_at" )
mclapply( x , getSymbol )
[[1]]
[[1]]$`36090_at`
[1] "TBL2"
[[2]]
[[2]]$`38785_at`
[1] "MUC1"
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
Matrix products: default
BLAS:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so
LAPACK:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] hgu95av2.db_3.2.3 org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0
[4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
[7] BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 digest_0.6.14 DBI_0.7 RSQLite_2.0
[5] pillar_1.1.0 rlang_0.1.6 blob_1.1.0 bit64_0.9-8
[9] bit_1.1-13 compiler_3.4.3 pkgconfig_2.0.1 memoise_1.1.0
[13] tibble_1.4.1
~G
On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:
good question
some of the discussion on
http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
read-only-in-memory-database-td91814.html
seems relevant.
converting the relatively small annotation package content to pure R
read-only tables on the master before parallelizing
might be very simple?
On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
Ludwig.Geistlinger at sph.cuny.edu> wrote:
Hi,
Within a package I am developing, I would like to enable parallel probe
to
gene mapping for a compendium of microarray datasets.
This accordingly makes use of annotation packages such as hgu133a.db,
which in turn connect to the SQLite database via AnnotationDbi.
When running in multi-core mode (i.e. using a MulticoreParam with
BiocParallel) using more than 2 cores, this causes the error:
database disk image is malformed
In a very similar problem:
https://support.bioconductor.org/p/38541/
Adi Tarca and Dan Tenenbaum identified and resolved this problem by
ensuring that each process has its own unique database connection, i.e.
AnnotationDbi is not loaded before sending the job to the workers.
This solution was easily realized as this analysis was carried out within
a script and not a package.
However, within my package, AnnotationDbi is loaded as a dependency of my
package's imports.
How to resolve this here?
I am not sure whether I perfectly understand the underlying mechanisms,
but is there a way to make my workers load their own version of
AnnotationDbi instead of using the one of the parent process?
Or am I supposed to unload all packages depending on AnnotationDbi, and
AnnotationDbi itself, before sending the job to the workers (and reload
all
of them after the job has finished?)
Thanks a lot,
Ludwig
--
Dr. Ludwig Geistlinger
CUNY School of Public Health
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
Thanks for these explanations, Martin. What I actually want to do is sending each expression dataset on probe level to a worker, where probe level expression is summarized to gene level expression according to a chosen summarization function (such as 'mean'). The worker should then return the expression dataset on gene level back to the manager. Taking also Vince' suggestion and your considerations on work load into account, it seems best if I first collect the probe2gene mappings for the unique set of annotation packages (which would only be 2 in the example) in serial mode on the manager. Then, I can send the datasets together with the respective mapping as a simple table to the workers, which carry out the actual computational part, ie. the summarization. This would avoid parallel access on the database and would also spare retrieving the mappings on each worker over in over again, as it is likely that some datasets of the compendium share the corresponding annotation db. Thanks, Ludwig -- Dr. Ludwig Geistlinger CUNY School of Public Health
From: Martin Morgan <martin.morgan at roswellpark.org>
Sent: Friday, January 19, 2018 4:10 PM
To: Ludwig Geistlinger; Gabe Becker; Vincent Carey
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
Sent: Friday, January 19, 2018 4:10 PM
To: Ludwig Geistlinger; Gabe Becker; Vincent Carey
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
On 01/19/2018 02:24 PM, Ludwig Geistlinger wrote:
> I apologize if I haven't been specific enough - however, I am also having trouble to reliably reproduce the error.
> It does not seem to be exclusively related to the combination of AnnotationDbi and parallel computation, but also with some other packages I load.
>
> While still trying to produce a minimal reproducible example, here is what returns the error quite reliably on an 8-core Linux machine:
>
> # loading some dependencies of my package
> library(org.Hs.eg.db)
> library(pathview)
> library(graph)
> library(BiocParallel)
>
> # annotation packages for the datasets in the compendium
> pkgs <- rep(c("hgu133plus2.db","hgu133a.db"), 42)
>
> #
> getSymbols <- function ( anno.pkg )
> {
> require(anno.pkg, character.only=TRUE)
> anno.pkg <- get(anno.pkg)
> syms <- AnnotationDbi::mapIds(anno.pkg, keys=keys(anno.pkg),
> keytype="PROBEID", column="ENTREZID")
> return(syms)
> }
>
>> x <- bplapply( pkgs , getSymbols ) ### sometimes I have to run this 2 or 3 times in a row to produce this error
> Loading required package: hgu133plus2.db
>
> Error: BiocParallel errors
> element index: 29, 30, 31, 32, 33, 34, ...
> first error: database disk image is malformed
My guess is that the database is being accessed by multiple processes
simultaneously and, even though the data bases are opened read-only,
this causes a corruption in the access of some sort. You can avoid
multiple processes accessing the database at the same time by using a 'lock'
getSymbols <- function ( anno.pkg, id )
{
nmspc <- loadNamespace(anno.pkg)
anno.pkg <- get(anno.pkg, nmspc)
BiocParallel::ipclock(id)
syms <- suppressMessages({
AnnotationDbi::mapIds(
anno.pkg, keys=keys(anno.pkg), keytype="PROBEID",
column="ENTREZID"
)
})
BiocParallel::ipcunlock(id)
length(syms)
}
x <- bplapply(pkgs , getSymbols, ipcid())
There are two additional considerations here.
The first is that one wants to worry about the amount of data transfered
between worker and manager compared to the amount of time spent in
computation. So in your previous formulation you sent back all the
symbols -- this will be relatively expensive compared to the amount of
work done in the function (reading the ids from the database), and you
would rather do more work and transmit less (both to and from the
worker) in each call to getSymbol().
The second is similar, but from the lock perspective -- since the lock
imposes essential serial evaluation through that portion of the code,
you'd like the locked portion of the worker's task to be just a small
portion of the total work done by the worker.
I guess a more clever use of locks would be one per data base (generate
two ipcid()'s in the manager, and pass these to the worker in such a way
that the worker uses the same lock for each database.
Martin
>
>
>> sessionInfo()
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: SUSE Linux Enterprise Desktop 12 SP3
>
> Matrix products: default
> BLAS: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRblas.so
> LAPACK: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRlapack.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] BiocParallel_1.12.0 graph_1.56.0 pathview_1.18.0
> [4] org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0 IRanges_2.12.0
> [7] S4Vectors_0.16.0 Biobase_2.38.0 BiocGenerics_0.24.0
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.14 KEGGgraph_1.38.1 XVector_0.18.0 zlibbioc_1.24.0
> [5] bit_1.1-12 R6_2.2.2 rlang_0.1.6 blob_1.1.0
> [9] httr_1.3.1 tools_3.4.1 grid_3.4.1 png_0.1-7
> [13] DBI_0.7 bit64_0.9-7 digest_0.6.13 tibble_1.4.1
> [17] Rgraphviz_2.22.0 KEGGREST_1.18.0 memoise_1.1.0 RSQLite_2.0
> [21] compiler_3.4.1 pillar_1.1.0 Biostrings_2.46.0 XML_3.98-1.9
> [25] pkgconfig_2.0.1
>
>
> --
> Dr. Ludwig Geistlinger
> CUNY School of Public Health
>
> ________________________________________
> From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Martin Morgan <martin.morgan at roswellpark.org>
> Sent: Friday, January 19, 2018 1:54 PM
> To: Gabe Becker; Vincent Carey
> Cc: bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
>
> On 01/19/2018 12:37 PM, Gabe Becker wrote:
>> IT seems like you could also force a copy of the reference object via
>> <dbobject>$copy() and then force a refresh of the conn slot by assigning a
>> new db connection into it.
>>
>> I'm having trouble confirming that this would work, however, because I
>> actually can't reproduce the error. The naive way works for me on my mac
>> laptop (which is running an old R and Bioconductor) and on the linux
>> cluster I have access to (running Bioc 3.6):
>>
>>
>> (cluster)
>>
>>> getSymbol <- function ( x ) {
>>
>> + return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )
>>
>> + }
>
> pass the data base connection to the function
>
> getSymbol <- function ( x, db )
> ## olde schoole
> AnnotationDbi::mget(x, db)
> ## AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
>
> and arrange for the general case, i.e., distinct processes with data
> serialized between them
>
> > cl = parallel::makePSOCKcluster(2)
> > parLapply(cl, x, getSymbol, hgu95av2SYMBOL)
> Error in checkForRemoteErrors(val) :
> 2 nodes produced errors; first error: external pointer is not valid
>
> (getSymbol would fail as originally written in the serial case, since
> the workers would not have access to hgu95av2SYMBOL
>
> The workaround is to open the connection on the node, e.g.,
>
> getSymbol <- function ( x, dbname ) {
> nmspc <- loadNamespace(dbname)
> db <- get(dbname, nmspc)
> AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
> }
>
> lapply(x, getSymbol, "hgu95av2.db")
> bplapply(x, getSymbol, "hgu95av2.db")
> bplapply(x, getSymbol, "hgu95av2.db", BPPARAM = SnowParam())
>
> Martin
>
>>>
>>
>>> x <- list( "36090_at" , "38785_at" )
>>
>>>
>>
>>> mclapply( x , getSymbol )
>>
>> [[1]]
>>
>> [[1]]$`36090_at`
>>
>> [1] "TBL2"
>>
>>
>>
>> [[2]]
>>
>> [[2]]$`38785_at`
>>
>> [1] "MUC1"
>>
>>
>>
>>>
>>
>>> sessionInfo()
>>
>> R version 3.4.3 (2017-11-30)
>>
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>>
>>
>> Matrix products: default
>>
>> BLAS:
>> /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so
>>
>> LAPACK:
>> /gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so
>>
>>
>> locale:
>>
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>
>> attached base packages:
>>
>> [1] stats4 parallel stats graphics grDevices utils datasets
>>
>> [8] methods base
>>
>>
>> other attached packages:
>>
>> [1] hgu95av2.db_3.2.3 org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0
>>
>> [4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
>>
>> [7] BiocGenerics_0.24.0
>>
>>
>> loaded via a namespace (and not attached):
>>
>> [1] Rcpp_0.12.14 digest_0.6.14 DBI_0.7 RSQLite_2.0
>>
>> [5] pillar_1.1.0 rlang_0.1.6 blob_1.1.0 bit64_0.9-8
>>
>> [9] bit_1.1-13 compiler_3.4.3 pkgconfig_2.0.1 memoise_1.1.0
>>
>> [13] tibble_1.4.1
>>
>>>
>>
>>
>> ~G
>>
>> On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <stvjc at channing.harvard.edu>
>> wrote:
>>
>>> good question
>>>
>>> some of the discussion on
>>>
>>> http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
>>> read-only-in-memory-database-td91814.html
>>>
>>> seems relevant.
>>>
>>> converting the relatively small annotation package content to pure R
>>> read-only tables on the master before parallelizing
>>> might be very simple?
>>>
>>> On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
>>> Ludwig.Geistlinger at sph.cuny.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Within a package I am developing, I would like to enable parallel probe
>>> to
>>>> gene mapping for a compendium of microarray datasets.
>>>>
>>>> This accordingly makes use of annotation packages such as hgu133a.db,
>>>> which in turn connect to the SQLite database via AnnotationDbi.
>>>>
>>>> When running in multi-core mode (i.e. using a MulticoreParam with
>>>> BiocParallel) using more than 2 cores, this causes the error:
>>>>
>>>> database disk image is malformed
>>>>
>>>>
>>>> In a very similar problem:
>>>>
>>>> https://support.bioconductor.org/p/38541/
>>>>
>>>> Adi Tarca and Dan Tenenbaum identified and resolved this problem by
>>>> ensuring that each process has its own unique database connection, i.e.
>>>> AnnotationDbi is not loaded before sending the job to the workers.
>>>>
>>>> This solution was easily realized as this analysis was carried out within
>>>> a script and not a package.
>>>>
>>>> However, within my package, AnnotationDbi is loaded as a dependency of my
>>>> package's imports.
>>>>
>>>> How to resolve this here?
>>>> I am not sure whether I perfectly understand the underlying mechanisms,
>>>> but is there a way to make my workers load their own version of
>>>> AnnotationDbi instead of using the one of the parent process?
>>>> Or am I supposed to unload all packages depending on AnnotationDbi, and
>>>> AnnotationDbi itself, before sending the job to the workers (and reload
>>> all
>>>> of them after the job has finished?)
>>>>
>>>> Thanks a lot,
>>>> Ludwig
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Ludwig Geistlinger
>>>> CUNY School of Public Health
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>>
>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.