Are there any packages that use Rcpp that use regular expressions?
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
22 messages · Hadley Wickham, JJ Allaire, Gabor Grothendieck +1 more
Are there any packages that use Rcpp that use regular expressions?
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
| Are there any packages that use Rcpp that use regular expressions? Great question. And not that I know of! Now that we have the BH package (for really easy access to Boost by using Boost headers via this CRAN package) I was thinking about a use case for Boost regex and file reading, but haven't had time to do anything about it. Other than Boost, we could possibly get access to the regexp libraries already linked into R. What use case did you have in mind? Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
On 1 March 2013 at 19:41, Gabor Grothendieck wrote: | Are there any packages that use Rcpp that use regular expressions? Great question. And not that I know of! Now that we have the BH package (for really easy access to Boost by using Boost headers via this CRAN package) I was thinking about a use case for Boost regex and file reading, but haven't had time to do anything about it. Other than Boost, we could possibly get access to the regexp libraries already linked into R. What use case did you have in mind?
I searched BH for regex and found nothing so I don't think BH includes Boost.Regex. Ideally that would be accessible as well as the tre, pcre (and Henry Spencer routines via Tcl) that R itself uses.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
| > Other than Boost, we could possibly get access to the regexp libraries | > already linked into R. What use case did you have in mind? | > | | I searched BH for regex and found nothing so I don't think BH includes | Boost.Regex. Could you register an issue ticket at the r-forge page for BH, please? There are other things missing too, of course, as we started pretty with the needs of "just" bigmemory and RcppBDT. | Ideally that would be accessible as well as the tre, | pcre (and Henry Spencer routines via Tcl) that R itself uses. With the R API being tied down more and more, I am unsure as whether we can get to these. But that said, Rcpp makes it pretty easy to link to external libraries -- so you could always try working directly with, say, libpcre. Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Gabor, Here is a quick variant of one of the Boost regexp examples, particularly http://www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp // cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp #include <Rcpp.h> #include <string> #include <boost/regex.hpp> bool validate_card_format(const std::string& s) { static const boost::regex e("(\\d{4}[- ]){3}\\d{4}"); return boost::regex_match(s, e); } const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"); const std::string machine_format("\\1\\2\\3\\4"); const std::string human_format("\\1-\\2-\\3-\\4"); std::string machine_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed); } std::string human_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed); } // [[Rcpp::export]] Rcpp::DataFrame regexDemo(std::vector<std::string> s) { int n = s.size(); std::vector<bool> valid(n); std::vector<std::string> machine(n); std::vector<std::string> human(n); for (int i=0; i<n; i++) { valid[i] = validate_card_format(s[i]); machine[i] = machine_readable_card_number(s[i]); human[i] = human_readable_card_number(s[i]); } return Rcpp::DataFrame::create(Rcpp::Named("input") = s, Rcpp::Named("valid") = valid, Rcpp::Named("machine") = machine, Rcpp::Named("human") = human); } which we can test with the same input as the example has: R> Rcpp::sourceCpp('/tmp/boostreex.cpp') R> s <- c("0000111122223333", "0000 1111 2222 3333", "0000-1111-2222-3333", "000-1111-2222-3333") R> regexDemo(s) input valid machine human 1 0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333 2 0000 1111 2222 3333 TRUE 0000111122223333 0000-1111-2222-3333 3 0000-1111-2222-3333 TRUE 0000111122223333 0000-1111-2222-3333 4 000-1111-2222-3333 FALSE 000111122223333 000-1111-2222-3333 R> On Linux, you generally don't have to do anything to get Boost headers as they end up in /usr/include (or /usr/local/include) so for me, this just builds. For R on Windows, you are quite likely to get by with the CRAN-provided boost tarball and an additional -I$(BOOSTLIB) etc. Hth, Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| I searched BH for regex and found nothing so I don't think BH includes | Boost.Regex. Could you register an issue ticket at the r-forge page for BH, please? There are other things missing too, of course, as we started pretty with the needs of "just" bigmemory and RcppBDT.
But Boost.Regex isn't header only? Hadley
Chief Scientist, RStudio http://had.co.nz/
| > | I searched BH for regex and found nothing so I don't think BH includes | > | Boost.Regex. | > | > Could you register an issue ticket at the r-forge page for BH, please? There | > are other things missing too, of course, as we started pretty with the needs | > of "just" bigmemory and RcppBDT. | | But Boost.Regex isn't header only? Yup. Found that out the hard way when I wrote it up as a piece for the Rcpp Gallery. You do need to link. Piece now up at http://gallery.rcpp.org/articles/boost-regular-expressions/ Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
On 1 March 2013 at 21:24, Hadley Wickham wrote: | > | I searched BH for regex and found nothing so I don't think BH includes | > | Boost.Regex. | > | > Could you register an issue ticket at the r-forge page for BH, please? There | > are other things missing too, of course, as we started pretty with the needs | > of "just" bigmemory and RcppBDT. | | But Boost.Regex isn't header only? Yup. Found that out the hard way when I wrote it up as a piece for the Rcpp Gallery. You do need to link. Piece now up at http://gallery.rcpp.org/articles/boost-regular-expressions/
I have downloaded boost such that I have this file: C:\MinGW\lib\libboost_regex.a How do I tell Rcpp to use it? -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
| >
| > | > | I searched BH for regex and found nothing so I don't think BH includes | > | > | Boost.Regex. | > | > | > | > Could you register an issue ticket at the r-forge page for BH, please? There | > | > are other things missing too, of course, as we started pretty with the needs | > | > of "just" bigmemory and RcppBDT. | > | | > | But Boost.Regex isn't header only? | > | > Yup. Found that out the hard way when I wrote it up as a piece for the Rcpp | > Gallery. You do need to link. | > | > Piece now up at http://gallery.rcpp.org/articles/boost-regular-expressions/ | > | | I have downloaded boost such that I have this file: | | C:\MinGW\lib\libboost_regex.a | | How do I tell Rcpp to use it? Follow eg the Rcpp Gallery story and use Sys.setenv("PKG_LIBS"="C:\MinGW\lib\libboost_regex.a") as static library can be given "as is". Else try Sys.setenv("PKG_LIBS"="-LC:\MinGW\lib\ -lboost_regex") which is the standard form. That should work with the example I posted. If everthing else, try the documentation for Rcpp (Rcpp-package vignette in particular) or Boost. Dirk | | | -- | Statistics & Software Consulting | GKX Group, GKX Associates Inc. | tel: 1-877-GKX-GROUP | email: ggrothendieck at gmail.com
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Note that the Sys.setenv technique described by Dirk will work for Rcpp from SVN but not (yet) for the version of Rcpp on CRAN. JJ
| Note that the Sys.setenv technique described by Dirk will work for | Rcpp from SVN but not (yet) for the version of Rcpp on CRAN. Ooops. My bad. So that precludes use of sourceCpp(). All other methods involving the R CMD COMPILE', 'R CMD SHLIB' etc wrapper will still listen to PKG_LIBS and link accordingly. Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
On 1 March 2013 at 23:03, Gabor Grothendieck wrote: | On Fri, Mar 1, 2013 at 9:50 PM, Dirk Eddelbuettel <edd at debian.org> wrote: | > | > On 1 March 2013 at 21:24, Hadley Wickham wrote: | > | > | I searched BH for regex and found nothing so I don't think BH includes | > | > | Boost.Regex. | > | > | > | > Could you register an issue ticket at the r-forge page for BH, please? There | > | > are other things missing too, of course, as we started pretty with the needs | > | > of "just" bigmemory and RcppBDT. | > | | > | But Boost.Regex isn't header only? | > | > Yup. Found that out the hard way when I wrote it up as a piece for the Rcpp | > Gallery. You do need to link. | > | > Piece now up at http://gallery.rcpp.org/articles/boost-regular-expressions/ | > | | I have downloaded boost such that I have this file: | | C:\MinGW\lib\libboost_regex.a | | How do I tell Rcpp to use it? Follow eg the Rcpp Gallery story and use Sys.setenv("PKG_LIBS"="C:\MinGW\lib\libboost_regex.a") as static library can be given "as is". Else try Sys.setenv("PKG_LIBS"="-LC:\MinGW\lib\ -lboost_regex") which is the standard form. That should work with the example I posted. If everthing else, try the documentation for Rcpp (Rcpp-package vignette in particular) or Boost.
Thanks. I doubled each backslash but unfortunately neither work.
I am able to build the original credit_card_example from the boost
site independently of Rcpp, i.e. this builds and I can run the result:
rem this works
C:\MinGW\set_distor_paths.bat
g++ credit_card_example.cpp -o credit_card_example.exe -lboost_regex
and can also build C++ scripts not using boost; however,
all of the following give:
credit.cpp:6:27: fatal error: boost/regex.hpp: No such file or directory
library(Rcpp)
Sys.setenv("PKG_LIBS"="C:\\MinGW\\lib\\libboost_regex.a")
sourceCpp("credit.cpp", verbose = TRUE)
library(Rcpp)
Sys.setenv("PKG_LIBS"="-LC:\\MinGW\\lib\\ -lboost_regex")
sourceCpp("credit.cpp", verbose = TRUE)
library(Rcpp)
Sys.setenv("PKG_LIBS"="-lboost_regex")
sourceCpp("credit.cpp", verbose = TRUE)
I also tried with Cygwin. I have this file among others:
C:\cygwin\lib\libboost_regex-mt.dll.a
and tried these but they give the same result:
library(Rcpp)
Sys.setenv("PKG_LIBS"="C:\\cygwin\\lib\\libboost_regex-mt.dll.a")
sourceCpp("credit.cpp", verbose = TRUE)
library(Rcpp)
Sys.setenv("PKG_LIBS"="-LC:\\cygwin\\lib\\ -lboost_regex")
sourceCpp("credit.cpp", verbose = TRUE)
Changing backslash to forward slash does not change anything.
Everything was done with the svn version of Rcpp:
packageVersion("Rcpp")
[1] '0.10.2.5'
| >
| > | >
| > | > | > | I searched BH for regex and found nothing so I don't think BH includes | > | > | > | Boost.Regex. | > | > | > | > | > | > Could you register an issue ticket at the r-forge page for BH, please? There | > | > | > are other things missing too, of course, as we started pretty with the needs | > | > | > of "just" bigmemory and RcppBDT. | > | > | | > | > | But Boost.Regex isn't header only? | > | > | > | > Yup. Found that out the hard way when I wrote it up as a piece for the Rcpp | > | > Gallery. You do need to link. | > | > | > | > Piece now up at http://gallery.rcpp.org/articles/boost-regular-expressions/ | > | > | > | | > | I have downloaded boost such that I have this file: | > | | > | C:\MinGW\lib\libboost_regex.a | > | | > | How do I tell Rcpp to use it? | > | > Follow eg the Rcpp Gallery story and use | > | > Sys.setenv("PKG_LIBS"="C:\MinGW\lib\libboost_regex.a") | > | > as static library can be given "as is". Else try | > | > Sys.setenv("PKG_LIBS"="-LC:\MinGW\lib\ -lboost_regex") | > | > which is the standard form. That should work with the example I posted. | > | > If everthing else, try the documentation for Rcpp (Rcpp-package vignette in | > particular) or Boost. | > | | Thanks. I doubled each backslash but unfortunately neither work. | | I am able to build the original credit_card_example from the boost | site independently of Rcpp, i.e. this builds and I can run the result: | | rem this works | C:\MinGW\set_distor_paths.bat | g++ credit_card_example.cpp -o credit_card_example.exe -lboost_regex | | and can also build C++ scripts not using boost; however, | | all of the following give: | credit.cpp:6:27: fatal error: boost/regex.hpp: No such file or directory | | library(Rcpp) | Sys.setenv("PKG_LIBS"="C:\\MinGW\\lib\\libboost_regex.a") | sourceCpp("credit.cpp", verbose = TRUE) | | library(Rcpp) | Sys.setenv("PKG_LIBS"="-LC:\\MinGW\\lib\\ -lboost_regex") | sourceCpp("credit.cpp", verbose = TRUE) | | library(Rcpp) | Sys.setenv("PKG_LIBS"="-lboost_regex") | sourceCpp("credit.cpp", verbose = TRUE) | | I also tried with Cygwin. I have this file among others: | C:\cygwin\lib\libboost_regex-mt.dll.a | | and tried these but they give the same result: | | library(Rcpp) | Sys.setenv("PKG_LIBS"="C:\\cygwin\\lib\\libboost_regex-mt.dll.a") | sourceCpp("credit.cpp", verbose = TRUE) | | library(Rcpp) | Sys.setenv("PKG_LIBS"="-LC:\\cygwin\\lib\\ -lboost_regex") | sourceCpp("credit.cpp", verbose = TRUE) | | Changing backslash to forward slash does not change anything. | | Everything was done with the svn version of Rcpp: | | > packageVersion("Rcpp") | [1] '0.10.2.5' Sorry that this is so frustrating, but this (IMNSHO) all just Windows... I would try two things: a) forward slashes (no escaping needed) b) use verbose=TRUE so that you see the R CMD ... invocation. Your initial boost test was key. We know you have a working boost library; we know Rcpp can create working code, now we just need to tie'em together. Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
On 2 March 2013 at 18:47, Gabor Grothendieck wrote: | On Sat, Mar 2, 2013 at 10:17 AM, Dirk Eddelbuettel <edd at debian.org> wrote: | > | > On 1 March 2013 at 23:03, Gabor Grothendieck wrote: | > | On Fri, Mar 1, 2013 at 9:50 PM, Dirk Eddelbuettel <edd at debian.org> wrote: | > | > | > | > On 1 March 2013 at 21:24, Hadley Wickham wrote: | > | > | > | I searched BH for regex and found nothing so I don't think BH includes | > | > | > | Boost.Regex. | > | > | > | > | > | > Could you register an issue ticket at the r-forge page for BH, please? There | > | > | > are other things missing too, of course, as we started pretty with the needs | > | > | > of "just" bigmemory and RcppBDT. | > | > | | > | > | But Boost.Regex isn't header only? | > | > | > | > Yup. Found that out the hard way when I wrote it up as a piece for the Rcpp | > | > Gallery. You do need to link. | > | > | > | > Piece now up at http://gallery.rcpp.org/articles/boost-regular-expressions/ | > | > | > | | > | I have downloaded boost such that I have this file: | > | | > | C:\MinGW\lib\libboost_regex.a | > | | > | How do I tell Rcpp to use it? | > | > Follow eg the Rcpp Gallery story and use | > | > Sys.setenv("PKG_LIBS"="C:\MinGW\lib\libboost_regex.a") | > | > as static library can be given "as is". Else try | > | > Sys.setenv("PKG_LIBS"="-LC:\MinGW\lib\ -lboost_regex") | > | > which is the standard form. That should work with the example I posted. | > | > If everthing else, try the documentation for Rcpp (Rcpp-package vignette in | > particular) or Boost. | > | | Thanks. I doubled each backslash but unfortunately neither work. | | I am able to build the original credit_card_example from the boost | site independently of Rcpp, i.e. this builds and I can run the result: | | rem this works | C:\MinGW\set_distor_paths.bat | g++ credit_card_example.cpp -o credit_card_example.exe -lboost_regex | | and can also build C++ scripts not using boost; however, | | all of the following give: | credit.cpp:6:27: fatal error: boost/regex.hpp: No such file or directory | | library(Rcpp) | Sys.setenv("PKG_LIBS"="C:\\MinGW\\lib\\libboost_regex.a") | sourceCpp("credit.cpp", verbose = TRUE) | | library(Rcpp) | Sys.setenv("PKG_LIBS"="-LC:\\MinGW\\lib\\ -lboost_regex") | sourceCpp("credit.cpp", verbose = TRUE) | | library(Rcpp) | Sys.setenv("PKG_LIBS"="-lboost_regex") | sourceCpp("credit.cpp", verbose = TRUE) | | I also tried with Cygwin. I have this file among others: | C:\cygwin\lib\libboost_regex-mt.dll.a | | and tried these but they give the same result: | | library(Rcpp) | Sys.setenv("PKG_LIBS"="C:\\cygwin\\lib\\libboost_regex-mt.dll.a") | sourceCpp("credit.cpp", verbose = TRUE) | | library(Rcpp) | Sys.setenv("PKG_LIBS"="-LC:\\cygwin\\lib\\ -lboost_regex") | sourceCpp("credit.cpp", verbose = TRUE) | | Changing backslash to forward slash does not change anything. | | Everything was done with the svn version of Rcpp: | | > packageVersion("Rcpp") | [1] '0.10.2.5' Sorry that this is so frustrating, but this (IMNSHO) all just Windows... I would try two things: a) forward slashes (no escaping needed) b) use verbose=TRUE so that you see the R CMD ... invocation. Your initial boost test was key. We know you have a working boost library; we know Rcpp can create working code, now we just need to tie'em together.
I had tried both these. It does not seem to be picking up the PKG_LIBS. There is no -l... or -L... on the g++ cmd line. Its also not clear precisely which boost distribution to use. I had tried http://nuwen.net/mingw.html (version 8.0) and also tried the boost library from Cygwin. If I use the MinGW from nuewn and run this from the Windows cnd line I get no errors or warnings (note that ^ must be the last character on the line to escape the newline): C:\MinGW\set_distro_paths.bat g++ -DNDEBUG ^ -L %userprofile%/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a ^ -lboost_regex ^ -I"C:/PROGRA~1/R/R-2.15/include" ^ -I%userprofile%/Documents/R/win-library/2.15/Rcpp/include ^ -O2 -Wall -mtune=core2 -c credit.cpp -o credit.o If tyhat is ok then what would the next step be? -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
| > Sorry that this is so frustrating, but this (IMNSHO) all just Windows... | > | > I would try two things: | > | > a) forward slashes (no escaping needed) | > | > b) use verbose=TRUE so that you see the R CMD ... invocation. | > | > Your initial boost test was key. We know you have a working boost library; | > we know Rcpp can create working code, now we just need to tie'em together. | > | | I had tried both these. It does not seem to be picking up the | PKG_LIBS. There is no -l... or -L... on the g++ cmd line. There was also an issue with sourceCpp overwriting rather than extending PKG_LIBS. I would suggest to for now ignore sourceCpp as a means of debugging. "All" we really need is one proper R CMD COMPILE step, and one R CMD SHLIB step. You can do that by hand -- my old talks have the explicit steps, but I think you you know what do to do. You can also try just ~/.R/Makevars -- which is what I do when switcing from g++ to clang (setting CXX and CC), changing compiler options or warnings (PKG_CXXFLAGS), ... and so on. | Its also not clear precisely which boost distribution to use. I had | tried http://nuwen.net/mingw.html (version 8.0) and also tried the | boost library from Cygwin. I have no idea. We need subsets of Boost for QuantLib, but as I recall that only covers the headers-only template use. In any event if you must use something compatible with MinGw, as always, so Cygwin is probably a no-no. | If I use the MinGW from nuewn and run this from the Windows cnd line I | get no errors or warnings (note that ^ must be the last character on | the line to escape the newline): | | C:\MinGW\set_distro_paths.bat | g++ -DNDEBUG ^ | -L %userprofile%/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a ^ | -lboost_regex ^ | -I"C:/PROGRA~1/R/R-2.15/include" ^ | -I%userprofile%/Documents/R/win-library/2.15/Rcpp/include ^ | -O2 -Wall -mtune=core2 -c credit.cpp -o credit.o | | If tyhat is ok then what would the next step be? That looks pretty good. Use similar settings, make sure R sees them, and let R CMD ... do its magic. Everything, be it via Makevars and Makevars.win, via inline's cxxfunction or Rcpp's sourceCpp just calls them anyway. Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
On 2 March 2013 at 20:39, Gabor Grothendieck wrote: | On Sat, Mar 2, 2013 at 7:05 PM, Dirk Eddelbuettel <edd at debian.org> wrote: | > Sorry that this is so frustrating, but this (IMNSHO) all just Windows... | > | > I would try two things: | > | > a) forward slashes (no escaping needed) | > | > b) use verbose=TRUE so that you see the R CMD ... invocation. | > | > Your initial boost test was key. We know you have a working boost library; | > we know Rcpp can create working code, now we just need to tie'em together. | > | | I had tried both these. It does not seem to be picking up the | PKG_LIBS. There is no -l... or -L... on the g++ cmd line. There was also an issue with sourceCpp overwriting rather than extending PKG_LIBS. I would suggest to for now ignore sourceCpp as a means of debugging. "All" we really need is one proper R CMD COMPILE step, and one R CMD SHLIB step. You can do that by hand -- my old talks have the explicit steps, but I think you you know what do to do. You can also try just ~/.R/Makevars -- which is what I do when switcing from g++ to clang (setting CXX and CC), changing compiler options or warnings (PKG_CXXFLAGS), ... and so on. | Its also not clear precisely which boost distribution to use. I had | tried http://nuwen.net/mingw.html (version 8.0) and also tried the | boost library from Cygwin. I have no idea. We need subsets of Boost for QuantLib, but as I recall that only covers the headers-only template use. In any event if you must use something compatible with MinGw, as always, so Cygwin is probably a no-no. | If I use the MinGW from nuewn and run this from the Windows cnd line I | get no errors or warnings (note that ^ must be the last character on | the line to escape the newline): | | C:\MinGW\set_distro_paths.bat | g++ -DNDEBUG ^ | -L %userprofile%/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a ^ | -lboost_regex ^ | -I"C:/PROGRA~1/R/R-2.15/include" ^ | -I%userprofile%/Documents/R/win-library/2.15/Rcpp/include ^ | -O2 -Wall -mtune=core2 -c credit.cpp -o credit.o | | If tyhat is ok then what would the next step be? That looks pretty good. Use similar settings, make sure R sees them, and let R CMD ... do its magic. Everything, be it via Makevars and Makevars.win, via inline's cxxfunction or Rcpp's sourceCpp just calls them anyway.
If I try this next after the above g++ line: R CMD SHLIB credit.o then I get this: C:/PROGRA~1/R/R-2.15/bin/x64/R.dll: file not recognized: File format not recognized collect2: ld returned 1 exit status
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
| > | If I use the MinGW from nuewn and run this from the Windows cnd line I | > | get no errors or warnings (note that ^ must be the last character on | > | the line to escape the newline): | > | | > | C:\MinGW\set_distro_paths.bat | > | g++ -DNDEBUG ^ | > | -L %userprofile%/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a ^ | > | -lboost_regex ^ | > | -I"C:/PROGRA~1/R/R-2.15/include" ^ | > | -I%userprofile%/Documents/R/win-library/2.15/Rcpp/include ^ | > | -O2 -Wall -mtune=core2 -c credit.cpp -o credit.o | > | | > | If tyhat is ok then what would the next step be? | > | > That looks pretty good. Use similar settings, make sure R sees them, and let | > R CMD ... do its magic. Everything, be it via Makevars and Makevars.win, via | > inline's cxxfunction or Rcpp's sourceCpp just calls them anyway. | > | | If I try this next after the above g++ line: | | R CMD SHLIB credit.o | | then I get this: | | C:/PROGRA~1/R/R-2.15/bin/x64/R.dll: file not recognized: File format | not recognized | collect2: ld returned 1 exit status You are mixing g++ and R CMD SHLIB, which is not recommended. Use only R CMD SHLIB, if possible. See Rcpp-FAQ Question 2.5 for one possibilty. Just expand compiler and linker flags to contain all you need. In essence, the (large but modular) Makevars in eg RInside do just that. Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Gabor,
Here is a completely new, second variant of the same example, this time
implemented using only inline with a custom plugin. This should have what
you need.
Code first:
-----------------------------------------------------------------------------
edd at max:/tmp$ cat boostregex.R
library(inline)
## NB double backslashes expanded to four backslashes because of quoting :-/
inctxt <- '
#include <Rcpp.h>
#include <string>
#include <boost/regex.hpp>
bool validate_card_format(const std::string& s) {
static const boost::regex e("(\\\\d{4}[- ]){3}\\\\d{4}");
return boost::regex_match(s, e);
}
const boost::regex e("\\\\A(\\\\d{3,4})[- ]?(\\\\d{4})[- ]?(\\\\d{4})[- ]?(\\\\d{4})\\\\z");
const std::string machine_format("\\\\1\\\\2\\\\3\\\\4");
const std::string human_format("\\\\1-\\\\2-\\\\3-\\\\4");
std::string machine_readable_card_number(const std::string& s) {
return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
}
std::string human_readable_card_number(const std::string& s) {
return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
}
'
srctxt <- '
std::vector<std::string> s = Rcpp::as<std::vector<std::string> >(sx);
int n = s.size();
std::vector<bool> valid(n);
std::vector<std::string> machine(n);
std::vector<std::string> human(n);
for (int i=0; i<n; i++) {
valid[i] = validate_card_format(s[i]);
machine[i] = machine_readable_card_number(s[i]);
human[i] = human_readable_card_number(s[i]);
}
return Rcpp::DataFrame::create(Rcpp::Named("input") = s,
Rcpp::Named("valid") = valid,
Rcpp::Named("machine") = machine,
Rcpp::Named("human") = human);
'
plug <- Rcpp:::Rcpp.plugin.maker(
include.before = "#include <boost/regex.hpp>",
libs = paste("-L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp",
"-Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib",
"-L/usr/lib -lboost_regex -lm"))
registerPlugin("boostDemo", plug )
regexDemo <- cxxfunction(signature(sx="CharVec"), body=srctxt, incl=inctxt, plugin="boostDemo", verbose=TRUE)
s <- c("0000111122223333", "0000 1111 2222 3333", "0000-1111-2222-3333", "000-1111-2222-3333")
regexDemo(s)
edd at max:/tmp$
-----------------------------------------------------------------------------
Output in verbose mode:
-----------------------------------------------------------------------------
edd at max:/tmp$
edd at max:/tmp$ Rscript boostregex.R
Loading required package: methods
>> setting environment variables:
PKG_LIBS = -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib -L/usr/lib -lboost_regex -lm -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib
>> LinkingTo : Rcpp
CLINK_CPPFLAGS = -I"/usr/local/lib/R/site-library/Rcpp/include"
>> Program source :
1 :
2 : // includes from the plugin
3 : #include <boost/regex.hpp>
4 : #include <Rcpp.h>
5 :
6 :
7 : #ifndef BEGIN_RCPP
8 : #define BEGIN_RCPP
9 : #endif
10 :
11 : #ifndef END_RCPP
12 : #define END_RCPP
13 : #endif
14 :
15 : using namespace Rcpp;
16 :
17 :
18 : // user includes
19 :
20 : #include <Rcpp.h>
21 : #include <string>
22 : #include <boost/regex.hpp>
23 :
24 : bool validate_card_format(const std::string& s) {
25 : static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
26 : return boost::regex_match(s, e);
27 : }
28 :
29 : const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
30 : const std::string machine_format("\\1\\2\\3\\4");
31 : const std::string human_format("\\1-\\2-\\3-\\4");
32 :
33 : std::string machine_readable_card_number(const std::string& s) {
34 : return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
35 : }
36 :
37 : std::string human_readable_card_number(const std::string& s) {
38 : return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
39 : }
40 :
41 :
42 : // declarations
43 : extern "C" {
44 : SEXP file13316a634edf( SEXP sx) ;
45 : }
46 :
47 : // definition
48 :
49 : SEXP file13316a634edf( SEXP sx ){
50 : BEGIN_RCPP
51 :
52 : std::vector<std::string> s = Rcpp::as<std::vector<std::string> >(sx);
53 : int n = s.size();
54 :
55 : std::vector<bool> valid(n);
56 : std::vector<std::string> machine(n);
57 : std::vector<std::string> human(n);
58 :
59 : for (int i=0; i<n; i++) {
60 : valid[i] = validate_card_format(s[i]);
61 : machine[i] = machine_readable_card_number(s[i]);
62 : human[i] = human_readable_card_number(s[i]);
63 : }
64 : return Rcpp::DataFrame::create(Rcpp::Named("input") = s,
65 : Rcpp::Named("valid") = valid,
66 : Rcpp::Named("machine") = machine,
67 : Rcpp::Named("human") = human);
68 :
69 : END_RCPP
70 : }
71 :
72 :
Compilation argument:
/usr/lib/R/bin/R CMD SHLIB file13316a634edf.cpp 2> file13316a634edf.cpp.err.txt
ccache g++-4.7 -I/usr/share/R/include -DNDEBUG -I"/usr/local/lib/R/site-library/Rcpp/include" -fpic -g0 -O3 -Wall -pipe -Wno-variadic-macros -pedantic -c file13316a634edf.cpp -o file13316a634edf.o
g++-4.7 -shared -o file13316a634edf.so file13316a634edf.o -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib -L/usr/lib -lboost_regex -lm -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib -L/usr/lib/R/lib -lR
input valid machine human
1 0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333
2 0000 1111 2222 3333 TRUE 0000111122223333 0000-1111-2222-3333
3 0000-1111-2222-3333 TRUE 0000111122223333 0000-1111-2222-3333
4 000-1111-2222-3333 FALSE 000111122223333 000-1111-2222-3333
edd at max:/tmp$
-----------------------------------------------------------------------------
You should be able to adapt this on Windows. Keeping my fingers crossed...
Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Gabor, Here is a quick variant of one of the Boost regexp examples, particularly http://www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp // cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp #include <Rcpp.h> #include <string> #include <boost/regex.hpp> bool validate_card_format(const std::string& s) { static const boost::regex e("(\\d{4}[- ]){3}\\d{4}"); return boost::regex_match(s, e); } const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"); const std::string machine_format("\\1\\2\\3\\4"); const std::string human_format("\\1-\\2-\\3-\\4"); std::string machine_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed); } std::string human_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed); } // [[Rcpp::export]] Rcpp::DataFrame regexDemo(std::vector<std::string> s) { int n = s.size(); std::vector<bool> valid(n); std::vector<std::string> machine(n); std::vector<std::string> human(n); for (int i=0; i<n; i++) { valid[i] = validate_card_format(s[i]); machine[i] = machine_readable_card_number(s[i]); human[i] = human_readable_card_number(s[i]); } return Rcpp::DataFrame::create(Rcpp::Named("input") = s, Rcpp::Named("valid") = valid, Rcpp::Named("machine") = machine, Rcpp::Named("human") = human); } which we can test with the same input as the example has: R> Rcpp::sourceCpp('/tmp/boostreex.cpp') R> s <- c("0000111122223333", "0000 1111 2222 3333", "0000-1111-2222-3333", "000-1111-2222-3333") R> regexDemo(s) input valid machine human 1 0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333 2 0000 1111 2222 3333 TRUE 0000111122223333 0000-1111-2222-3333 3 0000-1111-2222-3333 TRUE 0000111122223333 0000-1111-2222-3333 4 000-1111-2222-3333 FALSE 000111122223333 000-1111-2222-3333 R> On Linux, you generally don't have to do anything to get Boost headers as they end up in /usr/include (or /usr/local/include) so for me, this just builds. For R on Windows, you are quite likely to get by with the CRAN-provided boost tarball and an additional -I$(BOOSTLIB) etc.
I had no luck with sourceCpp or inline on Windows but did manage to
use R CMD SHLIB to build a dll which loads into 32-bit R (it does not
currently load in 64-bit R - haven't yet figured out why) and seems to
run ok there. To get it to work I did replace the first and last
statements in regexDemo with these two respectively so that all inputs
and outputs are SEXPs:
extern "C" SEXP regexDemo(SEXP ss) {
std::vector<std::string> s = Rcpp::as<std::vector<std::string> >(ss);
return Rcpp::wrap(Rcpp::DataFrame::create(Rcpp::Named("input") = s,
Rcpp::Named("valid") = valid,
Rcpp::Named("machine") = machine,
Rcpp::Named("human") = human));
This isn't quite as nice as what you had but at least it runs.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
| I had no luck with sourceCpp or inline on Windows but did manage to
| use R CMD SHLIB to build a dll which loads into 32-bit R (it does not
| currently load in 64-bit R - haven't yet figured out why) and seems to
| run ok there. To get it to work I did replace the first and last
| statements in regexDemo with these two respectively so that all inputs
| and outputs are SEXPs:
|
| extern "C" SEXP regexDemo(SEXP ss) {
| std::vector<std::string> s = Rcpp::as<std::vector<std::string> >(ss);
|
| return Rcpp::wrap(Rcpp::DataFrame::create(Rcpp::Named("input") = s,
| Rcpp::Named("valid") = valid,
| Rcpp::Named("machine") = machine,
| Rcpp::Named("human") = human));
|
| This isn't quite as nice as what you had but at least it runs.
Thanks for reporting back, and glad you get something to work. Going back to
the metal can help.
I won't be of much help for the Windows aspect. The basic things all work,
and from the 100+ CRAN packages now using Rcpp, many do involve libraries. If
you _really_ wanted to squash this you could try to rebuild libboost_regex.a
during the package build process (of a to be built package) to ensure that
the same g++ gets used. That should help with 32/64 bit too. But that is
surely overkill.
Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
On 4 March 2013 at 20:18, Gabor Grothendieck wrote:
| I had no luck with sourceCpp or inline on Windows but did manage to
| use R CMD SHLIB to build a dll which loads into 32-bit R (it does not
| currently load in 64-bit R - haven't yet figured out why) and seems to
| run ok there. To get it to work I did replace the first and last
| statements in regexDemo with these two respectively so that all inputs
| and outputs are SEXPs:
|
| extern "C" SEXP regexDemo(SEXP ss) {
| std::vector<std::string> s = Rcpp::as<std::vector<std::string> >(ss);
|
| return Rcpp::wrap(Rcpp::DataFrame::create(Rcpp::Named("input") = s,
| Rcpp::Named("valid") = valid,
| Rcpp::Named("machine") = machine,
| Rcpp::Named("human") = human));
|
| This isn't quite as nice as what you had but at least it runs.
Thanks for reporting back, and glad you get something to work. Going back to
the metal can help.
I won't be of much help for the Windows aspect. The basic things all work,
and from the 100+ CRAN packages now using Rcpp, many do involve libraries. If
you _really_ wanted to squash this you could try to rebuild libboost_regex.a
during the package build process (of a to be built package) to ensure that
the same g++ gets used. That should help with 32/64 bit too. But that is
surely overkill.
I did rebuild boost from source as part of this since I also figured that might be a problem. On Windows there is a tool that is used in rebuilding the source (presumably similar to configure on UNIX) and I rebuilt that tool from source too. I may play around with all this a bit more but it does take half an hour to build boost from source. Its possible that some of the things I tried prior to rebuilding from source would have worked had that they been tried with the rebuilt version of boost. Is it intended that BH will include boost libraries like Boost.Regex which is not a header-only library? Alternately if R made its own regular expression libraries available to C++ programs there could be an advantage in terms of consistency with R of the regular expressions accepted. Regarding packages, can you suggest which one(s) to look at? -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
| I did rebuild boost from source as part of this since I also figured | that might be a problem. On Windows there is a tool that is used in | rebuilding the source (presumably similar to configure on UNIX) and I | rebuilt that tool from source too. I may play around with all this a | bit more but it does take half an hour to build boost from source. | Its possible that some of the things I tried prior to rebuilding from | source would have worked had that they been tried with the rebuilt | version of boost. Yes. Boost is not for the faint of heart, and I am no fan of bjam. I am pretty glad I get this handed in a "just works" fashion on Unix. | Is it intended that BH will include boost libraries like Boost.Regex | which is not a header-only library? Alternately if R made its own No, absolutely not. The idea of BH is headers-only so we avoid all these hassles. So boost-regex was just a distraction. | regular expression libraries available to C++ programs there could be | an advantage in terms of consistency with R of the regular expressions | accepted. | | Regarding packages, can you suggest which one(s) to look at? RPostgreSQL certainly builds the client parts of the PG library on Windows when it cannot assume libraries. Within the Rcpp world I just came across another package that had Boost sources but right now I cannot recall which one it was. If you really just want regexp Boost may be overkill. Did you look into accessing libpcre via R? Or just bundling libpcre? Just a thought... Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com