Back to formatted view
Raw Message

Message-ID: <loom.20131121T195843-889@post.gmane.org>
Date: 2013-11-21T18:59:13Z
From: Ben Bolker
Subject: Thoughts for faster indexing

Neal Fultz <nfultz <at> gmail.com> writes:

> 
> Noah,
> 
> If N is # of rows, k is # of unique IDs
> 
> Using which() is O(N), using which() in a loop is going to  be O(Nk);
> 
> sorting the entire data is O(N ln N) and then you can process it in
> contiguous blocks, no which required.
> 
> -Neal
> 

  You might also take a look at the 'dplyr' package on Github: it's
next-gen plyr, engineered for performance ...

https://github.com/hadley/dplyr