Skip to content

hash table access, vector access &c

6 messages · David Winsemius, Sam Steingold

#
Hi,
I am confused by the way the indexing works.
I read a table from a csv file like this:

ysmd <- read.csv("ysmd.csv",header=TRUE);
ysmd.table <- hash();
for (i in 1:length(ysmd$X.stock)) ysmd.table[ysmd$X.stock[i]] <- ysmd[i,];

the first column ("X.stock") is a string (factor):
[1] FLO
7757 Levels: A AA AA- AAAAA AAC AACC AACOU AACOW AADR AAI AAME AAN AAON ... ZZZZT

when I print ysmd.table, I see the data I expect:
...
  ZIOP : ZIOP 402600000 3.03 7.85 707694 6.3717
  ZIP : ZIP 794900000 23.53 31.5 677046 23.2508
  ZIPR : ZIPR 47100000 2.28 3.5 21865 2.4058
  ZIV : ZIV -1 12.2987 17.3862 37455 16.6068
  ZIXI : ZIXI 254900000 2.1 4.88 905849 3.5146
...

moreover,
X.stock market.cap X52.week.low X52.week.high X3.month.average.daily.volume
100     FLO  2.984e+09      15.3133         22.37                       1021580
    X50.day.moving.average.price
100                      21.3769

quite correctly.
however,
<hash> containing 0 key-value pair(s).
  NA : NULL

so, how do I access the hash table element using non-literal strings?
or, how do I convert ysmd$X.stock[[100]] to a string from whatever it is
now?

thanks!
#
On Jul 5, 2011, at 12:53 PM, Sam Steingold wrote:

            
Actually I suspect you may be confused by how factors work. See below.
# And note that by default all character columns will become factors.
Have you considered:

  ysmd.table[ as.character( ysmd$X.stock[[100]])  ]

It appears that ysmd$X.stock[[100]] is a factor, and if so, you  
probably want the character value that its numeric representation  
points to. This is, of course, guesswork because you have not  
disclosed what package `hash` comes from, so I do not have the benefit  
of looking at its help page.

  
    
#
probably both :-(

being a lisper, I thought about factors as lisp symbols (and thus
thought that they would be accepted everywhere strings are).
indeed:
[1] "FLO"

however,
<hash> containing 0 key-value pair(s).
  NA : NULL

so, as.character is not the answer.
X.stock market.cap X52.week.low X52.week.high X3.month.average.daily.volume
100     FLO  2.984e+09      15.3133         22.37                       1021580
    X50.day.moving.average.price
100                      21.3769
I just did this:

library(hash);
hash-2.0.1 provided by Open Data.


thanks a lot for your help!
#
On Jul 5, 2011, at 2:10 PM, Sam Steingold wrote:

            
My error. Note the difference in indexing functions. "[" is not "[["
So you are here demonstrating that you should be using "[["

  
    
#
yes, thanks!

now, how do I extend a frame with new columns based on a hash table?

specifically, I have a frame:
'data.frame':	75986 obs. of  15 variables:
 $ aaaaaa   : POSIXlt, format: ...
 $ bbbbb    : POSIXlt, format: ...
 $ cccccccc : num  ...
 $ symbol   : Factor w/ 4521 levels "A","AA","AACC",..: 985 985 2322 3677 4486 4486 1607 3677 4486 1279 ...
 $ dddd     : int  500 500 ...
 $ eeeeeeee : num  16.61 5.74 ...

and a hash table:
Formal class 'hash' [package "hash"] with 1 slots
  ..@ .xData:<environment: 0x7f0e0e8>
Length  Class   Mode 
  7757   hash     S4
X.stock market.cap X52.week.low X52.week.high
3122     DFS 1.4606e+10        12.11         26.95
     X3.month.average.daily.volume X50.day.moving.average.price
3122                       6153430                      24.0242


I want to modify etr.rt (or create a new frame etr.rt.md) which would
have all the columns of etr.rt plus 5 additional columns 

market.cap
X52.week.low
X52.week.high
X3.month.average.daily.volume
X50.day.moving.average.price

which for the row number i in etr.rt come from
ysmd.table[[as.character(etr.rt$symbol[[i]])]]
(obviously,

etr.rt$symbol[[i]] == ysmd.table[[as.character(etr.rt$symbol[[i]])]]$X.stock

)

thanks!
1 day later
#
this function does the job but it unthinkably slow.

ysmd.extend <- function (fr) {
  len <- dim(fr)[1];
  fr$mcap <- vector(mode="numeric", len);
  fr$lo52 <- vector(mode="numeric", len);
  fr$hi52 <- vector(mode="numeric", len);
  fr$dvol <- vector(mode="numeric", len);
  fr$ma50 <- vector(mode="numeric", len);
  for (i in 1:len) {
    cat(i," ",fr$symbol[[i]],"(",as.character(fr$symbol[[i]]),")\n")
    tmp <- ysmd.table[[as.character(fr$symbol[[i]])]];
    fr$mcap[i] <- tmp$market.cap;
    fr$lo52[i] <- tmp$X52.week.low;
    fr$hi52[i] <- tmp$X52.week.high;
    fr$dvol[i] <- tmp$X3.month.average.daily.volume;
    fr$ma50[i] <- tmp$X50.day.moving.average.price;
  }
  fr
}
system.time(etr.rt <- ysmd.extend(etr.rt));

is there a way to do this without the inner for loop?

thanks!