[Bioc-devel] GenomicFiles reducer and iterate argument
We'll try a single arg to REDUCER and see how it goes. BTW I'm also going to swap out DataFrame for Vector in the rowData. DataFrame has been more difficult than anticipated (storing names, subsetting to get ranges out) and doesn't give any clear advantage over Vector. Val
On 06/17/2014 02:59 PM, Michael Lawrence wrote:
I think there are two different use cases here. The first, the one that
I think is driving the design, is that the user writes a function for a
particular problem, where the value of iterate is known. The other use
case is that the user gets a summary function from somewhere else (a
package) and applies it using reduceBy*. In that case, the user would
potentially need to write a wrapper, depending on the formals of the
reusable function. The only way I could make the second use case work
with the current design is to have a higher order function that returns
a universal iterator that detects the value of iterate via nargs() and
behaves appropriately. The higher order function would not need to be
known to the user, just the package developer.
On Tue, Jun 17, 2014 at 1:39 PM, Martin Morgan <mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>> wrote:
Val's out today and I'm at least part of the problem so...
On 06/17/2014 10:13 AM, Michael Lawrence wrote:
On Tue, Jun 17, 2014 at 7:00 AM, Valerie Obenchain
<vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>
wrote:
Hi Michael, Ryan,
Yes, it would be ideal to have a single signature for both
cases of
'iterate'. We went over the pros/cons again and at the end
of the day
decided to keep things as they are. No perfect solution here.
These were the primary points:
- Disadvantages of defining REDUCER with only '...' is that
'...' can
represent variables other than just the output from MAPPER.
Do you mean that "..." will capture additional arguments? From
where?
reduceBy* takes an argument ... and this is currently available to
both the MAPPER and REDUCER, see below.
- The unappealing aspect of the variadic approach is
introducing a new
check each time REDUCER is called.
What is this check?
- Going the other direction, considering a single arg for
REDUCER instead
two, requires coercing 'last' and 'current' to a list before
pulling them
apart again.
What is the problem with constructing this list? Isn't that one
extremely
fast line of code?
it's not the list construction but the lost convenience of named
arguments, in addition to consistency with Reduce when the data are
presented iteratively -- REDUCER=`+` instead of
REDUCER=function(lst) sum(unlist(lst, use.names=FALSE)).
It seems to me simpler to settle on one signature, and my
preference would
be for the single list argument, just because the call is
smaller and
simpler. Then have a convenient adaptor to handle the variadic case.
The variadic adapter concept is easy enough to understand in
context, but would send me for a head scratch at some later time.
Martin
Valerie
On 06/15/14 16:36, Michael Lawrence wrote:
I kind of prefer the adaptor solution, just for the sake
of API
cleanliness
(the MAPPER/REDUCER pair has some elegance), but I think
we agree that the
iterate switch introduces undesirable coupling.
On Sun, Jun 15, 2014 at 3:07 PM, Ryan
<rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
What about having two separate reducer arguments, one
for a reducer that
takes two elements at a time and combines them, and
the other for a
reducer
that takes a list and combines all the elements of
the list? Specifying
both at once would be an error. I think it makes
more sense to say "these
two arguments expect different things" than "this
one argument expects a
different thing depending on the value of another
argument".
-Ryan
On Sun Jun 15 11:17:59 2014, Michael Lawrence wrote:
I just thought there is some benefit for the
callback to be the same,
regardless of the iterate setting. This would
allow generalization
across
different data scales. Perhaps all that is
needed is a constructor for
an
adapter closure, one for each direction.
For example, the variadic adapter would look like:
Variadic <- function(FUN) {
function(x, y) {
if (missing(y)) {
do.call(FUN, x)
} else {
FUN(x, y)
}
}
}
That would make it easy to e.g. adapt rbind into
the framework. I wonder
if
there is precedent and better terminology from
the functional
programming
domain?
Michael
On Sun, Jun 15, 2014 at 8:38 AM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>
wrote:
On 06/15/2014 07:34 AM, Michael Lawrence wrote:
Hi guys,
Was just checking out GenomicFiles and
was a little surprised that the
arguments to the REDUCER are different
depending on iterate=TRUE vs.
iterate=FALSE. In my often flawed
opinion, iteration should not be a
concern of the REDUCER. It should be
oblivious to the iteration mode.
In
other words, when iterate=TRUE, it is a
special case of having two
objects
to combine, instead of multiple.
My 'rationale' was that one would
choose iterate=FALSE when one
required
all elements to perform the reduction. I
thought of the list (rather
than
...) as the general R data structure for
representing N elements, with
a
special case (consistent with Reduce) made
for the pairwise reduction
of
iterate=TRUE. Either way, the two cases (x,
y vs. list(), x, y vs. ...)
seem to require some explaining to the user.
Is there a clear better
choice? You're the second person to trip
over this, so I guess there's
a
crack in the sidewalk...
Martin
What would be convenient (but
unnecessary) is to detect from the
formal
arguments whether REDUCER is variadic or
list-based. In other words,
if
REDUCER is defined like function(...) {
} it is called via do.call(),
otherwise it is passed the list.
Thoughts? Maybe I'm totally confused?
Michael
[[alternative HTML version
deleted]]
_________________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Computational Biology / Fred Hutchinson
Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793 <tel:%28206%29%20667-2793>