Skip to content
Back to formatted view

Raw Message

Message-ID: <Pine.LNX.4.44.0411051813200.24480-100000@gannet.stats>
Date: 2004-11-05T18:17:07Z
From: Brian Ripley
Subject: Resources for optimizing code
In-Reply-To: <Pine.LNX.4.44.0411051850350.17589-100000@reclus.nhh.no>

On Fri, 5 Nov 2004, Roger Bivand wrote:

> On Fri, 5 Nov 2004, Janet Elise Rosenbaum wrote:
> 
> > 
> > I want to eliminate certain observations in a large dataframe (21000x100).
> > I have written code which does this using a binary vector (0=delete obs,
> > 1=keep), but it uses for loops, and so it's slow and in the extreme it 
> > causes R to hang for indefinite time periods.
> > 
> > I'm looking for one of two things:
> > 1.  A document which discusses how to avoid for loops and situations in
> > which it's impossible to avoid for loops.
> > 
> > or
> > 
> > 2.  A function which can do the above better than mine.  
> 
> ?subset
> newdata <- subset(DATAFRAME, asst==1)
> 
> which will work whether DATAFRAME is a matrix or data.frame (two different 
> classes).

Sorry, not for matrices:

> A <- matrix(1:20, 5)
> asst <- c(1,0,0,1,0)
> subset(A, asst)
[1]  1  4  6  9 11 14 16 19

Maybe it should, but in biggish problems like this it is almost certainly 
a bit more efficient to use the bare tools, that is indexing.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595