Hi! How would one best write an R wrapper package over a complex Python2 software (such as https://github.com/bulik/ldsc), that is still very widely used in statistical genetics? I'm writting an R package (that currently passes all --as-cran checks) for multiple other C++ softwares on the same topic as the one above, but this Python2 one I've difficulties with - it just looks like a bunch of hackish system() calls... And while it works on Linux and Mac, I've no idea whether it'd work on Windows. While it may seem easy to dismiss, actually LDSC is widely used in the statistical genetics field, and lots of people find it difficult to work with because of all the dependency files and weirdly documented commands, and because... well... Python2... Any tips? Or do you know anyone that I should contact/ask? Best wishes, Alexandru
[R-pkg-devel] Question on best approach to develop R package that wraps old complex Python2 software
4 messages · Alexandru Voda, Andrew Simmons, Stefan McKinnon Høj-Edwards +1 more
I would suggest the reticulate library in R. The few most important for
your case are reticulate::use_python_version and reticulate::import.
For example, in your R package, you should start with:
# change this to the name of the module you need
numpy <- NULL
.onLoad <- function (libname, pkgname)
{
reticulate::use_python_version("2.7") # change this as you need to
# .onLoad happens before the namespace is locked, so this is legitimate
numpy <<- reticulate::import("numpy", delay_load = list(
on_error = function(c) stop(
"unable to import 'numpy', try ",
sQuote("reticulate::py_install(\"numpy\")"),
" if it is not installed:\n ",
conditionMessage(c)
)
))
}
when your package's namespace is loaded, this will load the version of
python you need to use, and will lazy-import the module you need for your
python session.
On Tue, Jan 25, 2022 at 8:52 AM Alexandru Voda <alexandru.voda at seh.ox.ac.uk>
wrote:
Hi! How would one best write an R wrapper package over a complex Python2 software (such as https://github.com/bulik/ldsc), that is still very widely used in statistical genetics? I'm writting an R package (that currently passes all --as-cran checks) for multiple other C++ softwares on the same topic as the one above, but this Python2 one I've difficulties with - it just looks like a bunch of hackish system() calls... And while it works on Linux and Mac, I've no idea whether it'd work on Windows. While it may seem easy to dismiss, actually LDSC is widely used in the statistical genetics field, and lots of people find it difficult to work with because of all the dependency files and weirdly documented commands, and because... well... Python2... Any tips? Or do you know anyone that I should contact/ask? Best wishes, Alexandru [[alternative HTML version deleted]]
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
If the Python2 package is mainly system() calls, I would write an R package that essentially did the same, without relaying calls via the Python routines. I.e. let R call the commands directly. The only downside to this approach, is that R doesn't handle multithreading as well as Python does. In Python, you have the subcommand (subprocess?) module, I believe, with which you can call an external command and send it input, check its output, send new input, or just leave it be, until you want to check in on your subcommand. But to my knowledge, no similar method exists in R. Which brings us to the point: for how long do these external commands run? Regardless of module used in Python or directly called from R, the R process will wait. Mostly an issue for interactive uses. Another approach could be to have your R package handle data formatting, setup settings etc. and compiling a command with arguments, that the user may call at their leisure, whether on their laptop, cloud or HPC. When results have been factualised, they can return to your package to analyse the results. I used this approach for my badly named R package Siccuracy, for aiding with the imputation software AlphaImpute. Kindly, Stefan tir. 25. jan. 2022 16.27 skrev Andrew Simmons <akwsimmo at gmail.com>:
I would suggest the reticulate library in R. The few most important for
your case are reticulate::use_python_version and reticulate::import.
For example, in your R package, you should start with:
# change this to the name of the module you need
numpy <- NULL
.onLoad <- function (libname, pkgname)
{
reticulate::use_python_version("2.7") # change this as you need to
# .onLoad happens before the namespace is locked, so this is legitimate
numpy <<- reticulate::import("numpy", delay_load = list(
on_error = function(c) stop(
"unable to import 'numpy', try ",
sQuote("reticulate::py_install(\"numpy\")"),
" if it is not installed:\n ",
conditionMessage(c)
)
))
}
when your package's namespace is loaded, this will load the version of
python you need to use, and will lazy-import the module you need for your
python session.
On Tue, Jan 25, 2022 at 8:52 AM Alexandru Voda <
alexandru.voda at seh.ox.ac.uk>
wrote:
Hi! How would one best write an R wrapper package over a complex Python2 software (such as https://github.com/bulik/ldsc), that is still very widely used in statistical genetics? I'm writting an R package (that currently passes all --as-cran checks)
for
multiple other C++ softwares on the same topic as the one above, but this Python2 one I've difficulties with - it just looks like a bunch of
hackish
system() calls... And while it works on Linux and Mac, I've no idea
whether
it'd work on Windows.
While it may seem easy to dismiss, actually LDSC is widely used in the
statistical genetics field, and lots of people find it difficult to work
with because of all the dependency files and weirdly documented commands,
and because... well... Python2...
Any tips? Or do you know anyone that I should contact/ask?
Best wishes,
Alexandru
[[alternative HTML version deleted]]
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[[alternative HTML version deleted]]
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
On Tue, 25 Jan 2022, Stefan McKinnon H?j-Edwards wrote:
If the Python2 package is mainly system() calls, I would write an R package that essentially did the same, without relaying calls via the Python routines. I.e. let R call the commands directly. The only downside to this approach, is that R doesn't handle multithreading as well as Python does. In Python, you have the subcommand (subprocess?) module, I believe, with which you can call an external command and send it input, check its output, send new input, or just leave it be, until you want to check in on your subcommand. But to my knowledge, no similar method exists in R.
In R, you can use Tcl (tcltk) to do the same, which actually works better than Python. The problem with python is that last time I checked there were three different interfaces for process control and none of the them were complete - it was impossible to reliably control a third-party program, while performing other tasks. In contrast, Tcl has a well-designed interface that you can use in non-blocking mode. best Vladimir Dergachev
Which brings us to the point: for how long do these external commands run? Regardless of module used in Python or directly called from R, the R process will wait. Mostly an issue for interactive uses. Another approach could be to have your R package handle data formatting, setup settings etc. and compiling a command with arguments, that the user may call at their leisure, whether on their laptop, cloud or HPC. When results have been factualised, they can return to your package to analyse the results. I used this approach for my badly named R package Siccuracy, for aiding with the imputation software AlphaImpute. Kindly, Stefan tir. 25. jan. 2022 16.27 skrev Andrew Simmons <akwsimmo at gmail.com>:
I would suggest the reticulate library in R. The few most important for
your case are reticulate::use_python_version and reticulate::import.
For example, in your R package, you should start with:
# change this to the name of the module you need
numpy <- NULL
.onLoad <- function (libname, pkgname)
{
reticulate::use_python_version("2.7") # change this as you need to
# .onLoad happens before the namespace is locked, so this is legitimate
numpy <<- reticulate::import("numpy", delay_load = list(
on_error = function(c) stop(
"unable to import 'numpy', try ",
sQuote("reticulate::py_install(\"numpy\")"),
" if it is not installed:\n ",
conditionMessage(c)
)
))
}
when your package's namespace is loaded, this will load the version of
python you need to use, and will lazy-import the module you need for your
python session.
On Tue, Jan 25, 2022 at 8:52 AM Alexandru Voda <
alexandru.voda at seh.ox.ac.uk>
wrote:
Hi! How would one best write an R wrapper package over a complex Python2 software (such as https://github.com/bulik/ldsc), that is still very widely used in statistical genetics? I'm writting an R package (that currently passes all --as-cran checks)
for
multiple other C++ softwares on the same topic as the one above, but this Python2 one I've difficulties with - it just looks like a bunch of
hackish
system() calls... And while it works on Linux and Mac, I've no idea
whether
it'd work on Windows.
While it may seem easy to dismiss, actually LDSC is widely used in the
statistical genetics field, and lots of people find it difficult to work
with because of all the dependency files and weirdly documented commands,
and because... well... Python2...
Any tips? Or do you know anyone that I should contact/ask?
Best wishes,
Alexandru
[[alternative HTML version deleted]]
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[[alternative HTML version deleted]]
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[[alternative HTML version deleted]]
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel