[Rcpp-devel] Regular Expressions
On Fri, Mar 1, 2013 at 8:56 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
Gabor, Here is a quick variant of one of the Boost regexp examples, particularly http://www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp // cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp #include <Rcpp.h> #include <string> #include <boost/regex.hpp> bool validate_card_format(const std::string& s) { static const boost::regex e("(\\d{4}[- ]){3}\\d{4}"); return boost::regex_match(s, e); } const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"); const std::string machine_format("\\1\\2\\3\\4"); const std::string human_format("\\1-\\2-\\3-\\4"); std::string machine_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed); } std::string human_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed); } // [[Rcpp::export]] Rcpp::DataFrame regexDemo(std::vector<std::string> s) { int n = s.size(); std::vector<bool> valid(n); std::vector<std::string> machine(n); std::vector<std::string> human(n); for (int i=0; i<n; i++) { valid[i] = validate_card_format(s[i]); machine[i] = machine_readable_card_number(s[i]); human[i] = human_readable_card_number(s[i]); } return Rcpp::DataFrame::create(Rcpp::Named("input") = s, Rcpp::Named("valid") = valid, Rcpp::Named("machine") = machine, Rcpp::Named("human") = human); } which we can test with the same input as the example has: R> Rcpp::sourceCpp('/tmp/boostreex.cpp') R> s <- c("0000111122223333", "0000 1111 2222 3333", "0000-1111-2222-3333", "000-1111-2222-3333") R> regexDemo(s) input valid machine human 1 0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333 2 0000 1111 2222 3333 TRUE 0000111122223333 0000-1111-2222-3333 3 0000-1111-2222-3333 TRUE 0000111122223333 0000-1111-2222-3333 4 000-1111-2222-3333 FALSE 000111122223333 000-1111-2222-3333 R> On Linux, you generally don't have to do anything to get Boost headers as they end up in /usr/include (or /usr/local/include) so for me, this just builds. For R on Windows, you are quite likely to get by with the CRAN-provided boost tarball and an additional -I$(BOOSTLIB) etc.
I had no luck with sourceCpp or inline on Windows but did manage to
use R CMD SHLIB to build a dll which loads into 32-bit R (it does not
currently load in 64-bit R - haven't yet figured out why) and seems to
run ok there. To get it to work I did replace the first and last
statements in regexDemo with these two respectively so that all inputs
and outputs are SEXPs:
extern "C" SEXP regexDemo(SEXP ss) {
std::vector<std::string> s = Rcpp::as<std::vector<std::string> >(ss);
return Rcpp::wrap(Rcpp::DataFrame::create(Rcpp::Named("input") = s,
Rcpp::Named("valid") = valid,
Rcpp::Named("machine") = machine,
Rcpp::Named("human") = human));
This isn't quite as nice as what you had but at least it runs.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com