Skip to content

apply and cousins

5 messages · John Logsdon, William Dunlap, Bert Gunter +2 more

#
Folks

Is there any way to get the row index into apply as a variable?

I want a function to do some sums on a small subset of some very long
vectors, rolling through the whole vectors.

apply(X,1,function {do something}, other arguments)

seems to be the way to do it.

The subset I want is the most recent set of measurements only - perhaps a
couple of hundred out of millions - but I can't see how to index each
value.  The ultimate output should be a matrix of results the length of
the input vector.  But to do the sum I need to access the current row
number.

It is easy in a loop but that will take ages. Is there any vectorised
apply-like solution to this?

Or does apply etc only operate on each row at a time, independently of
other rows?


Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675
#
If you showed the loop that takes ages, along with small inputs for
it (and an indication of how to expand those small inputs to big ones),
someone might be able to show you some code that does the
same thing in less time.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jun 8, 2016 at 9:41 AM, John Logsdon <j.logsdon at quantex-research.com

  
  
#
John:

1. Please read and follow the posting guide. In particular, provide a
small reproducible example so that we know what your data and looping
code look like.

2. apply-type commands are *not* vectorized; they are disguised loops
that may or may not offer any speedup over explicit loops.

3. A guess at a possible strategy is to convert character date-time
data to POSIXct dates using as.POSITct and then just choose those rows
with the maximum value . e.g.

x[x==max(x)]

These operations *are* vectorized.

However, this guess might be completely useless with your unspecified
data, so beware.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Jun 8, 2016 at 9:41 AM, John Logsdon
<j.logsdon at quantex-research.com> wrote:
#
Hopefully Bert and William won't be offended if I more or less summarize:

Are you assuming a loop will take ages, or have you actually tested it? I
wouldn't assume a loop will take ages, or that it will take much longer
than apply().

What's wrong with

  apply( X[ {logical expression } , ] , 1, function {do something} )

?

Where the logical expression identifies (by row index or any other method)
which rows you need to work on. I would expect it to be faster to subset
the rows first, rather than test for inclusion at every iteration within a
loop.

Also, if the data is acquired in such a way that you can know that the
most recent set of measurements is the last n rows, then tail(X,n) might
be good. For example,
[,1] [,2]
 [1,]    1   11
 [2,]    2   12
 [3,]    3   13
 [4,]    4   14
 [5,]    5   15
 [6,]    6   16
 [7,]    7   17
 [8,]    8   18
 [9,]    9   19
[10,]   10   20
[,1] [,2]
 [7,]    7   17
 [8,]    8   18
 [9,]    9   19
[10,]   10   20
#
Hi John,
With due respect to the other respondents, here is something that might help:

# get a vector of values
foo<-rnorm(100)
# get a vector of increasing indices (aka your "recent" values)
bar<-sort(sample(1:100,40))
# write a function to "clump" the adjacent index values
clump_adj_int<-function(x) {
 index_list<-list(x[1])
 list_index<-1
 for(i in 2:length(x)) {
  if(x[i]==x[i-1]+1)
   index_list[[list_index]]<-c(index_list[[list_index]],x[i])
  else {
   list_index<-list_index+1
   index_list[[list_index]]<-x[i]
  }
 }
 return(index_list)
}
index_clumps<-clump_adj_int(bar)
# write another function to sum the values
sum_subsets<-function(indices,vector) return(sum(vector[indices],na.rm=TRUE))
# now "apply" the function to the list of indices
lapply(index_clumps,sum_subsets,foo)

Jim


On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon
<j.logsdon at quantex-research.com> wrote: