Skip to content

replacing ugly for loops

5 messages · Andrew Hoerner, Bert Gunter

#
I have a couple of hundred American Community Survey Summary Files files
containing rectangular arrays of data, mainly though not exclusively
numeric.  Each file is referred to as a sequence (henceforth "seq").  From
these files I am trying to extract particular subsets (tables) consisting of
a sets of columns.  These tables are defined by three numbers (now in
columns in a data frame):
1.	a file identifier (seq)
2.	first column position numbers (startNo) 
3.	length of table (len)
so the columns to select for one triple would consist of
startNo:(startNo+length-1).   I am trying to create for each sequence a
vector of all the column numbers for tables in that sequence.

Obviously I could do this with nested for loops,e.g..
selectCols <- numeric()
       for (j in seq_along(data.l[[i]]$startNo)){
           selectCols <- c(selectCols, 
data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j]
           data.l[[i]]$len[j]-1))
        }
    selectColsList[[i]] <- selectCols
}
[[1]]
[1]  3  4  5  6 10 11
[[2]]
[1]  3  4  5  6  7 15 16 17

But this code strikes me as inelegant and verbose. It seems to me that there
ought to be a way to make the outer loop, (indexed with i) into a tapply
function (which is why I started with a split()), and the inner loop
(indexed with j) into some cute recursive function, but I was not able to do
so. If anyone could suggest some nicer (e.g. shorter, or faster, or just
more sophisticated) way to do this instead, I would be most grateful.

Sincerely, andrewH




--
View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html
Sent from the R help mailing list archive at Nabble.com.
#
I am not sure you have expressed what you wanjt to do correctly. See inline:
On Wed, Oct 10, 2012 at 9:10 PM, andrewH <ahoerner at rprogress.org> wrote:
-- so 1 "seq" (terrible identifier -- see below for why) = 1 file

 From
So your data frame, call it yourframe, has columns named:

seq      startNo       len
So for each seq id you want to find all the column numbers, right?

sq.n <- seq_len(nrow(yourframe)) ## Just to make it easier to read
colms <-  tapply(sq.n, yourframe$seq,function(x) with(yourframe[x,],
   sort(unique(do.call(c, mapply(seq, from=startNo,
length=len,SIMPLIFY = FALSE)))))

## Comments
In the mapply call, seq is the R function, ?seq.  That's why using it
as a name for a file id is terrible -- it causes confusion.

In the absence of data, this is untested -- and probably not quite
right. But it should be close, I hope. The key idea is the use of
mapply to get the sequence of columns for each row in all the rows for
each seq id. The SIMPLIFY = FALSE guarantees that this yields a list
of vectors of column indices, which are then glopped together and
cleaned up by the sort(unique(do.call(  ...  stuff.

colms should then be a list giving the sorted column numbers to choose
for each "seq" id.

I do not know whether (once cleaned up,) this is either more elegant
or more efficient than what you proposed. And I wouldn't be surprised
if someone like Bill Dunlap comes up with a lot better way, either.
But it is different -- and perhaps amusing.

... If I have properly understood what you wanted. If not, ignore all.

Cheers,
Bert

  
    
#
Sorry, you **did** supply data and my solution **does** work (except I
left off 1 closing ")" .
+ sort(unique(do.call(c,mapply(seq,from=startNo,length=len,SIMPLIFY=FALSE))))))
$`1`
[1]  3  4  5  6 10 11

$`2`
[1]  3  4  5  6  7 15 16 17

Cheers,
Bert
On Wed, Oct 10, 2012 at 10:59 PM, Bert Gunter <bgunter at gene.com> wrote:

  
    
#
Dear Bert--
I tried your function on the data that I provided (data.df) and it worked
beautifully (after I added a missing final parenthesis), producing exactly
the same output as my function.  This is an excellent example of what I was
looking for, because it is 
   (a) 50% shorter than mine, 
   (b) fully vectorized, and 
   (c) uses three functions that I have never used before: with, unique, and
do.call

I am going to spend a happy afternoon working through this command by
command and at the end I am confident that I will have learned some valuable
new ( to me) tricks. 
Thanks!
Warmest Regards, AndrewH




--
View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821p4645914.html
Sent from the R help mailing list archive at Nabble.com.
#
I hate to decline such praise, but honesty demands that I must.

In fact, my solution is **not** fully vectorized at all! The tapply()
and mapply() calls are, in fact, in some sense hidden loops at the
interpreted levels. They do have the virtue of being true to R's
functional paradigm, but they are loops, nevertheless. For this
reason, they may not be more efficient then the explicit loops you've
written. But I hope the code is more transparent.

AndI did send a follow-up note to the list both acknowledging my
erroneous accusation that you did not provide data and confirming that
my proposed solution worked with the example you did, in fact,
provide.

But thanks for the kind words anyway.

-- Bert
On Thu, Oct 11, 2012 at 2:16 PM, andrewH <ahoerner at rprogress.org> wrote: