Skip to content
Prev 248861 / 398506 Next

Extract subsets of different and unknown lengths from huge dataset

A reproducible example would be nice, but if I understand you,
you want to find the index of values which are preceded by at 
least 24 zeroes.  The rle (run length encoding) function is 
very handy for problems like these.

Suppose the vector of interest is called "vec".  To create 
a vector called "start" whose value is "NA" except for those
positions immediately after at least 24 zeroes, you could try
something like this:

start = rep("NA",length(vec))
rls = rle(vec==0)
ind = cumsum(rls$lengths)[rls$lengths >= 24 & rls$values == TRUE] + 1
if(rls$values[length(rls$values)] == TRUE)ind = ind[-length(ind)]
start[ind] = 'start'

To number the starts, you could use something like

num = rep(0,length(vec))
num[start == 'start'] = 1:sum(start == 'start')


 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Sun, 30 Jan 2011, Dustin wrote: