Skip to content
Prev 3640 / 10988 Next

[Rcpp-devel] Testing for existence of named components

If I remember correctly, environments use hash tables to lookup names.

In pure R to check if an element L[["key"]] exists I would usually use
which seems to be reasonably fast. `%in%` works in vectors, too
returning exactly what Dirks suggestion was supposed to do (you missed an
"if", or alternatively an "or")

`%in%` uses .Internal(match(...)), which computes a temporary hash table as
far as I can see in the source (src/main/unique.c). I am not sure about the
implementation of the Hash. In particular I don't know if some kind of
caching is done, because the lookup is much faster than Dirks function (see
below) and computing a hash table for one lookup seems inefficient. I think
it would be good to use the internal `match` function of R to do what you
want. In Rinternals.h match is exposed as
  SEXP Rf_match(SEXP itable, SEXP ix, int nmatch)
where nmatch is the value to be returned if there is no match.

Below are the results of my little benchmark (see below). First parameter
of `lookup` is the number elements in L, second is the length of keys,
third one specifies the probability that a key is in present in the list.

Regards,
Jonas

##Benchmark results
test elapsed relative
2         f(L, keys)   0.641   160.25
3         g(L, keys)   0.005     1.25
1 keys %in% names(L)   0.004     1.00
test elapsed relative
2         f(L, keys)   0.643   160.75
3         g(L, keys)   0.004     1.00
1 keys %in% names(L)   0.004     1.00
test elapsed relative
2         f(L, keys)   0.656      164
3         g(L, keys)   0.004        1
1 keys %in% names(L)   0.004        1
test elapsed relative
2         f(L, keys)   0.654    163.5
3         g(L, keys)   0.004      1.0
1 keys %in% names(L)   0.004      1.0
test elapsed relative
2         f(L, keys)   0.647   161.75
3         g(L, keys)   0.004     1.00
1 keys %in% names(L)   0.005     1.25
test elapsed relative
2         f(L, keys)   0.647   161.75
3         g(L, keys)   0.004     1.00
1 keys %in% names(L)   0.005     1.25


## Code for benchmarks
library(inline)
library(rbenchmark)
f <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
    Rcpp::List lst(ls);
    Rcpp::CharacterVector nam = lst.names();
    std::vector<std::string> nm = Rcpp::as< std::vector< std::string >
int m = nam.size();

    Rcpp::CharacterVector tok(ts);
    std::vector<std::string> tk = Rcpp::as< std::vector< std::string >
int n = tok.size();

    Rcpp::LogicalVector   log(n);

    for (int i=0; i<n; i++) {  // look at all tokens
       log[i] = false;                   // assume false, but compare to
all names
       for (int j=0; j<m; j++) {
                    if(tk[i] == nm[j])
              log[i] = 1;     // and note equality if we find it
       }
    }
    return log;
 ')

g <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
    Rcpp::List lst(ls);
    Rcpp::CharacterVector nam = lst.names();

        Rcpp::IntegerVector ind(Rf_match(nam,ts,0));
    Rcpp::LogicalVector   log(ind);
        return log;
 ')


lookup<-function(nList, nKeys, P) {
    nList<-10000
    nKeys<-1000
    P<-0.1
    keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))
    L<-as.list(1:nList)
    names(L)<-as.character(1:nList)
    benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10,
columns=c("test", "elapsed", "relative"))
}

lookup(10000, 1000, 0.1)
lookup(10000, 1000, 1.0)
lookup(100000, 100, 0.1)
lookup(100000, 100, 1.0)
lookup(100000, 1, 0.1)
lookup(100000, 1, 1.0)
On Wed, Mar 28, 2012 at 3:34 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120328/2cd3908d/attachment-0001.html>