Skip to content

code optimization problem ... using or not using "which" function

3 messages · Juan Carlos Laguardia, jim holtman, Krzysztof Sakrejda

#
hello all,

I have two data sets that share certain fields of of interest (
facility, unit, date) which I want to match up, and from this extract
information from one dataset and store it in the other.

my first initial idea  (which I know is bad) goes like this:

##  capacity  and new_trayloc are datasets in example code:

for( i in 1: nrow( new_trayloc) {


theshifts<-which(as.Date(capacity$shift_dt) == new_trayloc$admit_dt[i] &
      as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) &
      as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i]))


thenightshifts<-which(as.Date(capacity$shift_dt) == new_trayloc$admit_dt[i]-1 &
      as.character(capacity$unit)==as.character(new_trayloc$UNIT_1[i]) &
      as.character(capacity$fac_id)==as.character(new_trayloc$ORIG_FAC_ID[i]))


..... obtain information by using theshifts and thenightshifts objects
and store in new_trayloc

}

. by doing a system.time on the entire for loop for 5 iterations, i
get a time of
 user  system elapsed
  25.66    1.04   26.72

That seems really bad... and plus, i need to run it for over 100,000 iterations.

Any suggestions in either the way I match the fields, or my approach
to my problem?


Cheers,
Juan Carlos
#
Why not use the 'merge' function?

Krzysztof

Sent via BlackBerry by AT&T

-----Original Message-----
From: jim holtman <jholtman at gmail.com>

Date: Fri, 29 May 2009 20:55:12 
To: Juan Carlos Laguardia<brassman785 at gmail.com>
Cc: <r-help at r-project.org>
Subject: Re: [R] code optimization problem ... using or not using "which"
	function


For a start, do all your conversions to character and Date once outside the
loop so you are not doing them for each iteration.  Not exactly sure what
you are doing, but it looks like with the 'and's you are only checking for
the rows that are the same.  You might want to use a 'match' function like:

x <- match(capacity$shift_dt, new_trayloc$admin)

to get where each of the items match and then when you have done it for the
three conditions, you then find columns that have the same number indicating
all condition match for that row.

On Fri, May 29, 2009 at 7:17 PM, Juan Carlos Laguardia <
brassman785 at gmail.com> wrote: