An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101222/7ca3fe7d/attachment.pl>
python-like dictionary for R
6 messages · Phil Spector, Martin Morgan, Seth Falcon +1 more
Paul -
You can also use named vectors as something similar to
a python dictionary:
nvec = c('one'=20,'two'=30,'three'=40)
nvec['four'] = 50
nvec['one']
one 20
nvec['four']
four 50 Although the result is named, it can be used as a regular R value:
20 + nvec['three']
three
60
If the names annoy you (as they seem to annoy many R users),
you can unname the object:
unname(20 + nvec['three'])
[1] 60 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu
On Wed, 22 Dec 2010, Paul Rigor wrote:
Hi, I was wondering if anyone has played around this this package called "rdict"? It attempts to implement a hash table in R using skip lists. Just came across it while trying to look for simpler text manipulation methods: http://userprimary.net/posts/2010/05/29/rdict-skip-list-hash-table-for-R/ Cheers, Paul [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 12/22/2010 05:49 PM, Paul Rigor wrote:
Hi, I was wondering if anyone has played around this this package called "rdict"? It attempts to implement a hash table in R using skip lists. Just came across it while trying to look for simpler text manipulation methods: http://userprimary.net/posts/2010/05/29/rdict-skip-list-hash-table-for-R/
kind of an odd question, so kind of an odd answer. I'd say this was an implementation of skip lists in C with an R interface. An example of a data structure implementation in R is Matloff's binary tree http://tolstoy.newcastle.edu.au/R/e12/help/10/10/0519.html which is perhaps more helpful for understanding the opportunities and limitations of the R language per se. Martin
Cheers, Paul [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
On Wed, Dec 22, 2010 at 7:05 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 12/22/2010 05:49 PM, Paul Rigor wrote:
Hi, I was wondering if anyone has played around this this package called "rdict"? It attempts to implement a hash table in R using skip lists. Just came across it while trying to look for simpler text manipulation methods: http://userprimary.net/posts/2010/05/29/rdict-skip-list-hash-table-for-R/
kind of an odd question, so kind of an odd answer. I'd say this was an implementation of skip lists in C with an R interface.
I had to play around with the rdict package in order to write it, but haven't used it much since :-P Be sure to look at R's native environment objects which provide a hash table structure and are suitable for many uses. + seth
Seth Falcon | @sfalcon | http://userprimary.net/
7 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101230/7837aac6/attachment.pl>
1 day later
On 12/30/2010 02:30 PM, Paul Rigor wrote:
Thanks gang, I'll work with named vectors and concatenate as needed.
This might be ok for small problems, but concatenation is an inefficient
R pattern -- the objects being concatenated are copied in full, so
becomes longer, and the concatenation slower, with each new key. With
a <- integer(); t0 <- Sys.time()
for (i in seq_len(1e6)) {
a <- c(a, i)
if (0 == i %% 10000)
print(i / as.numeric(Sys.time() - t0))
}
we have, in 'appends per second'
[1] 3236.76
[1] 2425.111
[1] 1757.52
[1] 1331.846
We don't really have a dictionary here, either, as the 'key' values are
not stored. Phil's suggest suffers from the same type of issue, where
the addition of new keys implies growing (reallocating) the vector.
a <- integer(); t0 <- Sys.time()
for (i in seq_len(1e6)) {
key <- as.character(i)
a[[key]] <- i
if (0 == i %% 10000)
print(i / as.numeric(Sys.time() - t0))
}
[1] 12659.18
[1] 9516.288
[1] 6821.47
[1] 5907.782
Better to use an environment (and live with reference semantics)
e <- new.env(parent=emptyenv()); t0 <- Sys.time()
for (i in seq_len(1e6)) {
key <- as.character(i)
e[[key]] <- i
if (0 == i %% 10000)
print(i / as.numeric(Sys.time() - t0))
}
with
[1] 20916.56
[1] 21421.85
[1] 21762.04
[1] 21207.69
[1] 21239.19
The usual alternative to the concatenation pattern is
'pre-allocate-and-fill'
x <- integer(1e6); t0 <- Sys.time()
for (i in seq_len(1e6)) {
???
}
but this doesn't work with key/value pairs because there is no sense (or
is there?) in which the keys can be 'pre-allocated'.
Creating the dictionary in one go is very efficient
system.time(d <-
+ structure(seq_len(1e6), .Names=as.character(seq_len(1e6)))) user system elapsed 0.417 0.002 0.419 Martin
Paul
On Thu, Dec 23, 2010 at 7:39 AM, Seth Falcon <seth at userprimary.net
<mailto:seth at userprimary.net>> wrote:
On Wed, Dec 22, 2010 at 7:05 PM, Martin Morgan <mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>> wrote:
> On 12/22/2010 05:49 PM, Paul Rigor wrote:
>> Hi,
>>
>> I was wondering if anyone has played around this this package called
>> "rdict"? It attempts to implement a hash table in R using skip
lists. Just
>> came across it while trying to look for simpler text manipulation
methods:
>>
>>
>
> kind of an odd question, so kind of an odd answer.
>
> I'd say this was an implementation of skip lists in C with an R
> interface.
I had to play around with the rdict package in order to write it, but
haven't used it much since :-P
Be sure to look at R's native environment objects which provide a hash
table structure and are suitable for many uses.
+ seth
--
Seth Falcon | @sfalcon | http://userprimary.net/
Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793