[Rcpp-devel] Testing for existence of named components
If I remember correctly, environments use hash tables to lookup names. In pure R to check if an element L[["key"]] exists I would usually use
"key" %in% names(L)
which seems to be reasonably fast. `%in%` works in vectors, too
keys %in% L
returning exactly what Dirks suggestion was supposed to do (you missed an "if", or alternatively an "or") `%in%` uses .Internal(match(...)), which computes a temporary hash table as far as I can see in the source (src/main/unique.c). I am not sure about the implementation of the Hash. In particular I don't know if some kind of caching is done, because the lookup is much faster than Dirks function (see below) and computing a hash table for one lookup seems inefficient. I think it would be good to use the internal `match` function of R to do what you want. In Rinternals.h match is exposed as SEXP Rf_match(SEXP itable, SEXP ix, int nmatch) where nmatch is the value to be returned if there is no match. Below are the results of my little benchmark (see below). First parameter of `lookup` is the number elements in L, second is the length of keys, third one specifies the probability that a key is in present in the list. Regards, Jonas ##Benchmark results
lookup(10000, 1000, 0.1)
test elapsed relative 2 f(L, keys) 0.641 160.25 3 g(L, keys) 0.005 1.25 1 keys %in% names(L) 0.004 1.00
lookup(10000, 1000, 1.0)
test elapsed relative 2 f(L, keys) 0.643 160.75 3 g(L, keys) 0.004 1.00 1 keys %in% names(L) 0.004 1.00
lookup(100000, 100, 0.1)
test elapsed relative 2 f(L, keys) 0.656 164 3 g(L, keys) 0.004 1 1 keys %in% names(L) 0.004 1
lookup(100000, 100, 1.0)
test elapsed relative 2 f(L, keys) 0.654 163.5 3 g(L, keys) 0.004 1.0 1 keys %in% names(L) 0.004 1.0
lookup(100000, 1, 0.1)
test elapsed relative 2 f(L, keys) 0.647 161.75 3 g(L, keys) 0.004 1.00 1 keys %in% names(L) 0.005 1.25
lookup(100000, 1, 1.0)
test elapsed relative
2 f(L, keys) 0.647 161.75
3 g(L, keys) 0.004 1.00
1 keys %in% names(L) 0.005 1.25
## Code for benchmarks
library(inline)
library(rbenchmark)
f <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
Rcpp::List lst(ls);
Rcpp::CharacterVector nam = lst.names();
std::vector<std::string> nm = Rcpp::as< std::vector< std::string >
(nam);
int m = nam.size();
Rcpp::CharacterVector tok(ts);
std::vector<std::string> tk = Rcpp::as< std::vector< std::string >
(tok);
int n = tok.size();
Rcpp::LogicalVector log(n);
for (int i=0; i<n; i++) { // look at all tokens
log[i] = false; // assume false, but compare to
all names
for (int j=0; j<m; j++) {
if(tk[i] == nm[j])
log[i] = 1; // and note equality if we find it
}
}
return log;
')
g <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
Rcpp::List lst(ls);
Rcpp::CharacterVector nam = lst.names();
Rcpp::IntegerVector ind(Rf_match(nam,ts,0));
Rcpp::LogicalVector log(ind);
return log;
')
lookup<-function(nList, nKeys, P) {
nList<-10000
nKeys<-1000
P<-0.1
keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))
L<-as.list(1:nList)
names(L)<-as.character(1:nList)
benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10,
columns=c("test", "elapsed", "relative"))
}
lookup(10000, 1000, 0.1)
lookup(10000, 1000, 1.0)
lookup(100000, 100, 0.1)
lookup(100000, 100, 1.0)
lookup(100000, 1, 0.1)
lookup(100000, 1, 1.0)
On Wed, Mar 28, 2012 at 3:34 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
On 28 March 2012 at 15:17, Ulrich Bodenhofer wrote:
| Thanks for your swift reply, Dirk! Wow, to be frank, that is not what I
| was expecting. In the meantime, I read Section 5.9.6 of "Writing R
| extensions" and I was stunned to see a solution there that is similar to
| the one you propose. I do not know R internals very well, but I cannot
| believe that accesses by names are implemented by tediously searching
| through all names sequentially. There must be some hash table behind,
| right? Otherwise, list accesses would be terribly slow. If all my
It's a trade-off: Inserting into a hash table has cost, you would enforce
that on every (!!) vector use, whether names are looked up or not.
Also, vectors are a natural representation as R's SEXP are just linear
collections.
| assumptions are right, I wonder why the R API does not make this
| mechanism available.
|
| May I add one more question: where can I find the definition of the List
| class and the implementation of its methods in the Rcpp package source
| code? I was looking for this, but got lost. Are you using the method
The only answer I can give is to ... keep looking. It's messy, because
there
is so much code---but that is because Rcpp does so much stuff.
But some header files are more important than others -- and the Vector.h
file
as well as the Vector/ subdirectory are important.
As mentioned, Rcpp::List() is a vector too.
| described in Section 5.9.6 of "Writing R extensions" (which I doubt) or
| something else. I would actually be quite curious to learn more about
| how lists are implemented in Rcpp.
They are just vectors, hence the linear traversal.
Dirk
| Thanks and best regards,
| Ulrich
|
|
| If so, I wonder why this mechanism has not been
|
| On 03/28/2012 02:56 PM, Dirk Eddelbuettel wrote:
| > On 28 March 2012 at 13:56, Ulrich Bodenhofer wrote:
| > | My question is the following: is there any way of checking in
whether a
| > | component of an Rcpp list (or vector) with a given name exists in
this list. If
| > | I simply try accessing a non-existing component, I get an "index out
of bounds"
| > | error. Trying to catch a possible exception did not work either. I
also browsed
| > | the Rcpp package source code, but unfortunately I got lost. Sorry if
this has
| > | been addressed on this list before. At least I googled in many
different ways,
| > | but could not find anything. Any ideas? Any help is gratefully
appreciated!
| >
| > Good question, and I have the suspicion that we answered that years
ago on
| > the list before.
| >
| > Here is a super-pedestrian version. It takes a list, extracts its
names() --
| > and should probably test whether names is empty ? -- and then compares
these
| > against a vector of tokens, returning a bool for each token:
| >
| > R>
| > R> suppressMessages(library(inline))
| > R>
| > R> f<- cxxfunction(signature(ls="list", ts="character"),
plugin="Rcpp", body='
| > + Rcpp::List lst(ls);
| > + Rcpp::CharacterVector nam = lst.names();
| > + std::vector<std::string> nm = Rcpp::as< std::vector<
std::string> >(nam);
| > + int m = nam.size();
| > +
| > + Rcpp::CharacterVector tok(ts);
| > + std::vector<std::string> tk = Rcpp::as< std::vector<
std::string> >(tok);
| > + int n = tok.size();
| > +
| > + Rcpp::LogicalVector log(n);
| > +
| > + for (int i=0; i<n; i++) { // look at all tokens
| > + log[i] = false; // assume false, but compare
to all names
| > + for (int j=0; j<m; j++) {
| > + log[i] = (tk[i] == nm[j]); // and note equality if we
find it
| > + }
| > + }
| > + return log;
| > + ')
| > R> ll<- list(aa=1,b="b") ## list with names 'aa' and 'b'
| > R> f(ll, c("aa", "b", "c")) ## does it contain 'a' or 'b' or
'c' ?
| > [1] FALSE TRUE FALSE
| > R> f(ll, "c") ## works with scalars too
| > [1] FALSE
| > R>
| >
| > I am sure there is a much cleverer solution one could write...
| >
| > Dirk
| >
|
--
R/Finance 2012 Conference on May 11 and 12, 2012 at UIC in Chicago, IL
See agenda, registration details and more at http://www.RinFinance.com
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120328/2cd3908d/attachment-0001.html>