Thanks Jim and others (and sorry Jim - an early version of this slipped
into your inbox :))
Apologies for not giving some concrete code - I was trying to explain in
words.
What I need to do is to fit a simple linear model to successive sections
of a long matrix.
So far, the best solution I have come up with uses apply twice:
Generate some data in a 100000*3 matrix:
N = 100000
Z = cbind(1:N,cumsum(rnorm(N,1,0.01)),rnorm(N,1.2,0.1)) #
where the first column is an index, the second a monotonic increasing
value representing time and the third just the measurements I want to
process.
Then write a function dVals1:
dVals1 = function(Y,DD,dT){which.min((Y[2] - dT) > DD[,2])))
which will identify the first row where the time is greater than current
time - dT.
So to identify the start of the data (say) 10 units before for each row,
we use apply and prepended this as a column to the array for later use:
ZZ = cbind(apply(Z,1,dVals1,Z,10),Z)
There may be some cases, particularly at the start, where later values are
extracted because the minimum returned by which.min is 1.
I now have start and finish pointers for each position so can proceed to
fit a simple linear model with the following function:
dVals2=function(D2,DD){
if((D2[2]-D2[1])<10){return(rep(0,2))} # reject short examples
DX=DD[D2[1]:D2[2],]
Res=as.vector(lm(DX[,3]~DX[,2])$coefficients)
return(Res)
}
which returns 2 0's either if there are fewer than 10 values, otherwise it
returns the intercept and slope calculated over the specified range.
Applying this to the whole data by:
t(apply(ZZ,1,dVals2,DD=ZZ))
does the job I think returning the results as an N * 2 matrix.
Hi John,
With due respect to the other respondents, here is something that might
help:
# get a vector of values
foo<-rnorm(100)
# get a vector of increasing indices (aka your "recent" values)
bar<-sort(sample(1:100,40))
# write a function to "clump" the adjacent index values
clump_adj_int<-function(x) {
index_list<-list(x[1])
list_index<-1
for(i in 2:length(x)) {
if(x[i]==x[i-1]+1)
index_list[[list_index]]<-c(index_list[[list_index]],x[i])
else {
list_index<-list_index+1
index_list[[list_index]]<-x[i]
}
}
return(index_list)
}
index_clumps<-clump_adj_int(bar)
# write another function to sum the values
sum_subsets<-function(indices,vector)
return(sum(vector[indices],na.rm=TRUE))
# now "apply" the function to the list of indices
lapply(index_clumps,sum_subsets,foo)
Jim
On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon
<j.logsdon at quantex-research.com> wrote:
Folks
Is there any way to get the row index into apply as a variable?
I want a function to do some sums on a small subset of some very long
vectors, rolling through the whole vectors.
apply(X,1,function {do something}, other arguments)
seems to be the way to do it.
The subset I want is the most recent set of measurements only - perhaps a
couple of hundred out of millions - but I can't see how to index each
value. The ultimate output should be a matrix of results the length of
the input vector. But to do the sum I need to access the current row
number.
It is easy in a loop but that will take ages. Is there any vectorised
apply-like solution to this?
Or does apply etc only operate on each row at a time, independently of
other rows?
Best wishes
John
John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
Best wishes
John
John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675
Best wishes
John
John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675