merge performace degradation in 2.9.1

Mon, Jul 13, 2009 6:59 PM

If the example is to demonstrate a difference between R versions that you 
really need to get to the bottom of then read no further.  However, if the 
example is actually what you want to do then you can speed it up by using a 
data.table as follows to reduce the 26 secs to 1 sec.

Time on my PC at home (quite old now!) :

user  system elapsed
  25.63    0.58   26.98

Using a data.table instead :
X <- data.table(group=rep(12:1, each=N), mon=rep(rev(month.abb), each=N), 
key="mon")
Y <- data.table(mon=month.abb, letter=letters[1:12], key="mon")
tables()
     NAME      NROW COLS       KEY
[1,] X    1,200,000 group,mon  mon
[2,] Y           12 mon,letter mon

user  system elapsed
   0.98    0.11    1.10

[1] TRUE

[1] TRUE

[1] TRUE

To do the multi-column equi-join of X and Z, set a key of 2 columns. 
'nomatch' is the equivalent of 'all' and can be set to 0 (inner join) or NA 
(outer join).


"Adrian Dragulescu" <adrian_d at eskimo.com> wrote in message 
news:Pine.LNX.4.64.0907090953580.1125 at shell.eskimo.com...

merge performace degradation in 2.9.1

Thread (2 messages)