Skip to content

spped up a function

6 messages · Santiago Guallar, David Winsemius, jim holtman +1 more

#
Hi,

I have written a function to assign the values of a certain variable 'wd' from a dataset to another dataset. Both contain data from the same?time period but differ in the length of their time intervals: 'GPS' has regular 10-minute intervals whereas 'xact' has irregular intervals. I attached simplified text versions from write.table. You can also get a dput of 'xact' in this address: http://www.megafileupload.com/en/file/431569/xact-dput.html).
The original objects are large and the function takes almost one hour to finish.
Here's the function:

fxG= function(xact, GPS){
l <- rep( 'A', nrow(GPS) )
v <- unique(GPS$Ring) # the process is carried out for several individuals identified by 'Ring'
for(k in 1:length(v) ){
I = v[k]
df <- xact[xact$Ring == I,]
for(i in 1:nrow(GPS)){
if(GPS[i,]$Ring== I){# the code runs along the whole data.frame for each i; it'd save time to make it stop with the last record of each i instead
u <- df$timepos <= GPS[i,]$timepos
# fill vector l for each interval t from xact <= each interval from GPS (take the max if there's > 1 interval)
l[i] <- df[max( which(u == TRUE) ),]$wd
}
}
}
return(l)}

vwd <- fxG(xact, GPS)


My question is: how can I speed up (optimize) this function?

Thank you for your help
#
On Jul 2, 2013, at 10:47 AM, Santiago Guallar wrote:

            
Simplified a bit , this is starting to look like a case for the split function:
# After doing the simplification I must ask how GPS[i,]$Ring could not == v ( or I)
#perhaps tail(df[which(u), 'wd'],1)?
This looks like it will be overwriting the l-object with every iteration of 'k'
The first thing you should do is describe in natural language what is desired to be done with objects: 'xact' and 'GPS' not yet described .... rather than asking for simplification of obscure nested  for-loops with probably redundant assignments and extraneous conditions. Make a simple example of such objects and repost.
#
first thing to do when trying to speed up a function is to see where it is spending its time.  take a subset of the data and use Rprof to profile the code.  my guess is that a lot of time is taken up in the use of dataframes.  see if you can use matrices instead.

Sent from my iPad
On Jul 2, 2013, at 13:47, Santiago Guallar <sguallar at yahoo.com> wrote:

            
4 days later
#
Hi

It seems to me, that you basically want merge, but I can miss the point. Try post

dput(head(xact))
dput(head(GPS))

and what shall be desired result based on those 2 datasets.

Regards
Petr
#
Hi Petr, yes the function basically consists on merging two time series with different time intervals: one regular 'GPS' and one irregular 'xact' (the latter containing the binomial variable 'wd' that I want to add to 'GPS'.
Apparently my attachments did not go through. Here you have the dputs you requested plus the desired result based on them:

head(xact)
Ringjul ? timepos ? actwd
6106933 15135 2011-06-10 04:36:15 ?3822 dry
6106933 15135 2011-06-10 05:39:57 ? ?27 wet
6106933 15135 2011-06-10 05:40:24 ? ?60 dry
6106933 15135 2011-06-10 05:41:24 ? ? 6 wet
6106933 15135 2011-06-10 05:41:30 ? 753 dry
6106933 15135 2011-06-10 05:54:03 ? ?78 wet
6106933 15135 2011-06-10 05:55:21 ? ?15 dry
6106933 15135 2011-06-10 05:55:36 ? ?18 wet

head(GPS1, 16) and desired result (added column wd)?

? ? ? Ring ? jul ? ? ? ? ? ? timeposwd
5 ?6106933 15135 2011-06-10 04:39:00dry
6 ?6106933 15135 2011-06-10 04:44:00dry
7 ?6106933 15135 2011-06-10 04:49:00dry
8 ?6106933 15135 2011-06-10 04:54:00dry
9 ?6106933 15135 2011-06-10 04:59:00dry
10 6106933 15135 2011-06-10 05:04:00dry
11 6106933 15135 2011-06-10 05:09:00dry
12 6106933 15135 2011-06-10 05:13:00dry
13 6106933 15135 2011-06-10 05:18:00dry
14 6106933 15135 2011-06-10 05:23:00dry
15 6106933 15135 2011-06-10 05:28:00dry
16 6106933 15135 2011-06-10 05:33:00dry
17 6106933 15135 2011-06-10 05:38:00dry
18 6106933 15135 2011-06-10 05:43:00dry
19 6106933 15135 2011-06-10 05:48:00dry
20 6106933 15135 2011-06-10 05:53:00dry

Santi
irregular
there's