Skip to content

Resources for optimizing code

5 messages · Janet, Roger Bivand, Brian Ripley +1 more

#
I want to eliminate certain observations in a large dataframe (21000x100).
I have written code which does this using a binary vector (0=delete obs,
1=keep), but it uses for loops, and so it's slow and in the extreme it 
causes R to hang for indefinite time periods.

I'm looking for one of two things:
1.  A document which discusses how to avoid for loops and situations in
which it's impossible to avoid for loops.

or

2.  A function which can do the above better than mine.  

My code is pasted below.

Thanks so much,

Janet 

# asst is a binary vector of length= nrow(DATAFRAME).  
# 1= observations you want to keep.  0= observation to get rid of.

remove.xtra.f <-function(asst, DATAFRAME) {
	n<-sum(asst, na.rm=T)
	newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME))
	j<-1
	for(i in 1:length(data)) {
		if (asst[i]==1) {
			newdata[j,]<-DATAFRAME[i,]
			j<-j+1
		}
	}
	newdata.f<-as.data.frame(newdata)
	names(newdata.f)<-names(DATAFRAME)
	return(newdata.f)
}
--  
Janet Rosenbaum                                 jerosenb at fas.harvard.edu
PhD Candidate in Health Policy, Harvard GSAS
Harvard Injury Control Research Center, Harvard School of Public Health
#
On Fri, 5 Nov 2004, Janet Elise Rosenbaum wrote:

            
?subset
newdata <- subset(DATAFRAME, asst==1)

which will work whether DATAFRAME is a matrix or data.frame (two different 
classes).

  
    
#
On Fri, 5 Nov 2004, Janet Elise Rosenbaum wrote:

            
`S Programming': see the FAQ.
But at the level of the example below, chapter 2 of MASS4 (FAQ again for 
details).
How about DATAFRAME[asst == 1, ] ?

I am not sure if asst has NAs in, but if it has you will get an error from 
                if (asst[i]==1)
and if not, you don't need na.rm=T.
where the subsetting took less than a second for me.

Note that your code converts DATAFRAME to a matrix. If that is reasonable 
(e.g. it is all numeric), then matrix indexing will be faster.

  
    
#
On Fri, 5 Nov 2004, Roger Bivand wrote:

            
Sorry, not for matrices:
[1]  1  4  6  9 11 14 16 19

Maybe it should, but in biggish problems like this it is almost certainly 
a bit more efficient to use the bare tools, that is indexing.
#
Have you tried reading the manual "An Introduction to R", with special 
attention to "Array Indexing" (indexing for data frames is pretty similar 
to indexing for matrices).

Unless I'm misunderstanding, what you want to do is very simple.  It is 
possible to use numeric vectors with 0 and 1 to indicate whether you want 
to keep the row, but it's a little easier with logical vectors.  Here's an 
example:

 > x <- data.frame(a=1:5,b=letters[1:5])
 > keep.num <- ifelse(x$a %% 2 == 1, 1, 0)
 > keep.num
[1] 1 0 1 0 1
 > keep.logical <- (x$a %% 2) == 1
 > keep.logical
[1]  TRUE FALSE  TRUE FALSE  TRUE
 > x[keep.num==1,,drop=F]
   a b
1 1 a
3 3 c
5 5 e
 > x[keep.logical,,drop=F]
   a b
1 1 a
3 3 c
5 5 e
 >
At Friday 10:34 AM 11/5/2004, Janet Elise Rosenbaum wrote: