Skip to content

python-like dictionary for R

6 messages · Phil Spector, Martin Morgan, Seth Falcon +1 more

#
Paul -
    You can also use named vectors as something similar to
a python dictionary:
one
  20
four
   50

Although the result is named, it can be used as a regular R
value:
three
    60

If the names annoy you (as they seem to annoy many R users),
you can unname the object:
[1] 60

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Wed, 22 Dec 2010, Paul Rigor wrote:

            
#
On 12/22/2010 05:49 PM, Paul Rigor wrote:
kind of an odd question, so kind of an odd answer.

I'd say this was an implementation of skip lists in C with an R
interface. An example of a data structure implementation in R is
Matloff's binary tree

  http://tolstoy.newcastle.edu.au/R/e12/help/10/10/0519.html

which is perhaps more helpful for understanding the opportunities and
limitations of the R language per se.

Martin

  
    
#
On Wed, Dec 22, 2010 at 7:05 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
I had to play around with the rdict package in order to write it, but
haven't used it much since :-P
Be sure to look at R's native environment objects which provide a hash
table structure and are suitable for many uses.

+ seth
7 days later
1 day later
#
On 12/30/2010 02:30 PM, Paul Rigor wrote:
This might be ok for small problems, but concatenation is an inefficient
R pattern -- the objects being concatenated are copied in full, so
becomes longer, and the concatenation slower, with each new key. With

a <- integer(); t0 <- Sys.time()
for (i in seq_len(1e6)) {
    a <- c(a, i)
    if (0 == i %% 10000)
        print(i / as.numeric(Sys.time() - t0))
}

we have, in 'appends per second'

[1] 3236.76
[1] 2425.111
[1] 1757.52
[1] 1331.846

We don't really have a dictionary here, either, as the 'key' values are
not stored. Phil's suggest suffers from the same type of issue, where
the addition of new keys implies growing (reallocating) the vector.

a <- integer(); t0 <- Sys.time()
for (i in seq_len(1e6)) {
    key <- as.character(i)
    a[[key]] <- i
    if (0 == i %% 10000)
        print(i / as.numeric(Sys.time() - t0))
}
[1] 12659.18
[1] 9516.288
[1] 6821.47
[1] 5907.782


Better to use an environment (and live with reference semantics)

e <- new.env(parent=emptyenv()); t0 <- Sys.time()
for (i in seq_len(1e6)) {
    key <- as.character(i)
    e[[key]] <- i
    if (0 == i %% 10000)
        print(i / as.numeric(Sys.time() - t0))
}

with

[1] 20916.56
[1] 21421.85
[1] 21762.04
[1] 21207.69
[1] 21239.19

The usual alternative to the concatenation pattern is
'pre-allocate-and-fill'

x <- integer(1e6); t0 <- Sys.time()
for (i in seq_len(1e6)) {
    ???
}

but this doesn't work with key/value pairs because there is no sense (or
is there?) in which the keys can be 'pre-allocated'.

Creating the dictionary in one go is very efficient
+     structure(seq_len(1e6), .Names=as.character(seq_len(1e6))))
   user  system elapsed
  0.417   0.002   0.419

Martin