Skip to content

Wish list

23 messages · A.J. Rossini, Peter Dalgaard, Thomas Lumley +6 more

#
Kevin,

I was glad to see your list.  Some of the items were reasons for creating
some of the functions in Hmisc.  summarize and mApply in conjunction with
llist handle labeling of output - this is actually quite tricky and the
Hmisc solution isn't perfect.  Dropping unused factor levels by default
(with easy override) is an old battle and I agree with you completely that
for everyday data analysis I almost always want to do this.  But I haven't
been able to convince anyone else about that, despite repeated attempts. 
[.factor in Hmisc drops unused levels by default.  To be honest, the one
place I've gotten into trouble with this default occasionally is in
multiple panels in lattice related to consistent assignment of line styles
and symbols across strata when the "groups" variable has missing cells in
some panels.

I also share your views about namespaces.  These have caused numerous
problems for me.  It would be nice to have more of a mechanism to put
"feelers" out to the R user community when major changes are planned. 
Namespaces seemed to appear on the scene quite quickly.  I do see some
advantages for them though.  By contrast, I have been very relieved that
S4 classes have not posed a problem for my code that relies on "old"
classes (totally unlike my experience with S-Plus) but any time changes
are made that involve some incompatibilies with old code there should be
some pause.

In Hmisc and Design I reference several functions that were not exported
from packages that now use namespaces.  There is an elegant solution with
the package:::function notation, but I have been unable to use this
solution because I use one code base for all versions of R and S-Plus. 
This notation generates syntax errors in all but late versions of R.

Let me add to the wish list the creation of some mechanism to better track
improvements and bug fixes in packages, such as a change log link by each
package's area in CRAN, or easy access to CVS information from there. 
When I report bugs (e.g., in read.xport in foreign [due somewhat to
problems inherent with SAS's format] or ace or avas in acepack) it would
be nice to see some announcement when the bugs are resolved, or to easily
track this.  Even a checkbox that the package maintainer has seen the bug
report even if she/he currently does not have time to work on it would be
very helpful, as would a notation that the bug report was found to be
"buggy".

Frank

---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University
1 day later
#
Frank E Harrell Jr writes:
I highly agree with you on this. It would be very nice having a fully
featured bug reporting system, where you could upload patches, discuss
improvements on existing packages or on the R-core itself, request for
features and so on. I think that Bugzilla (www.bugzilla.org) would suit
these expectations very well. It is the bug tracking system used by huge
projects like Mozilla (bugzilla.mozilla.org), Gentoo (bugs.gentoo.org),
and  Redhat (bugzilla.redhat.com), and based on my own experience I'd say
it addresses most of the things you pointed out.

	It works (at least in Gentoo, which is the one I'm more used to
working with) like this: someone files in a bug report. In the bug report
itself one informs the type of the bug report (a bug, a feature improvement, a
request to the developer), the severity, and any other relevant information. It
is also possible to upload attachments (like proposed patches) or additional
information on the report.

	The bug report then is assigned to a given group or, in the case of
packages to the person who is in charge for mantaining it. Anyone then can read
the bug report and make suggestions or propose fixes (see for example:
http://bugs.gentoo.org/show_bug.cgi?id=30784 ). [As opposed to the current
system, where the bug report can't even be linked to a website, and all the
discussion should be done via the mailing lists]. The maintainer or any
other other authorized developer can then accept or reject the proposed
suggestions, close the bug as duplicate, as invalid or at least inform that
he is aware of the problem and will work on it some time later.

	Just my two cents,

--
[]'s
Fernando Henrique Ferraz P. da Rosa
#
Bugzilla is a pain-in-the-arse to maintain, unless they've improved it
in the last 9 months.  Just my two cents...

best,
-tony

Fernando Henrique Ferraz <feferraz@ime.usp.br> writes:

  
    
#
rossini@blindglobe.net (A.J. Rossini) writes:
It might have (and I wouldn't mind replacing Jitterbug with something
that is actually maintained itself!), but there's another rear-end problem... 

Who's going to do the actual work? 

This means both fixing bugs and keeping the bug repository current.
This is not easy, even for base R. We have the Jitterbug r-bugs site,
which at least helps us not to forget bugs that have been reported,
but we often forget to close bugs as they are fixed and it often takes
a while before someone gets around to sorting the incoming directory.
This work is not going away with a more advanced system, and getting
the - hmmm - "varied" group of package maintainers to participate
sounds like a can of worms to me.
#
On Sun, 18 Jan 2004, Fernando Henrique Ferraz wrote:

            
A lot of wishlist suggestions need at least cooperation from R-core, who
may not agree that a change is desirable even if someone else were to
write the code. A  bug-tracking system for contributed packages is one of
the exceptions. There's nothing to stop some package developer(s) created
a bug tracking system and making accounts available to other people
(except the time, resources, security issues, etc).

Keeping track of changes is harder. The CVS commit logs for foreign and
survival are with the log for R itself on http://developer.r-project.org.
It's not even that hard to write R code to read the page and extract
entries relevant to that package.

For CRAN to list changes to other packages would require cooperation from
all the package developers. If the maintainer of acepack isn't
sufficiently together to reply to your messages, he probably won't be
keeping up with other aspects of change tracking.  Even trying to extract
a NEWS or Changelog file might not work -- eg for survival the Changelog
file is Terry Therneau's change log, not my log for changes to the R port.

	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle
#
Peter Dalgaard <p.dalgaard@biostat.ku.dk> writes:
That's my point.  Maintainance is critical, and any time spent on
systems administration (systems in the generic sense, here it's
bug-tracking) is less time spent on other more useful, or interesting,
or high-payoff items.

I'm probably not the only one who wished Peter had more time for other
things...

best,
-tony
#
A.J. Rossini writes:
I agree that it would  probably take some time to set up the new system,
and maintaince at the beggining would take time too. On the other hand I believe
that the  producitivy gain fostered by the new system on the long term  could perhaps
counterbalance the initial effort. The main effort would be setting up the new
system and handdling accounts to the package mantainers, after that, most of it
would be left to the users of the system. I don't have much experience with
administrating the Bugzilla system yet but I'm installing in my box and will post
my experiences to the list.

 
--
[]'s
Fernando Henrique Ferraz P. da Rosa
#
Fernando Henrique Ferraz <feferraz@ime.usp.br> writes:
Excellent!  It would be great to have a place to track bugs for
packages.  Best of luck!

best,
-tony
#
On Sat, 17 Jan 2004 09:33:10 -0500, you wrote:

            
Changes always show up in r-devel (the main CVS branch, not the
mailing list) first.  Package developers should be keeping a
relatively up to date copy of it around if they're doing things that
are likely to break.
I'd recommend avoiding that as much as you can.  If things aren't
exported from a package, then the package writer is likely to feel
free to change them without warning.  It's much better to convince the
package writer that they missed something in their export list.
I think it's reasonable to restrict the availability of updates to
your packages to the currently released R version.  There are reasons
why people might not be up to date (e.g. only doing upgrades at a
specific time of year), but they'll still have access via CRAN to
older versions of your package.

Compatibility with S-PLUS is a lot harder, of course.  

Duncan Murdoch
#
On Sun, 18 Jan 2004 18:47:52 -0500
Duncan Murdoch <dmurdoch@pair.com> wrote:

            
I need to do that more often.  But sometimes it's hard to know what things
I do that are likely to break.  That's where there needs to be some other
mechanism for user communications.
That's a good solution in general, but I could see legitimate
disagreements about what should be exported, so this will not always solve
the problem.
Yes that's the real problem.

Thanks Duncan  -Frank
---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University
#
On Mon, 19 Jan 2004, Frank E Harrell Jr wrote:

            
Well, there is a NEWS file that is worth consulting, and we (Kurt in 
particular) run all the CRAN packages after every major change and daily.
See  http://cran.r-project.org/src/contrib/checkSummary.html.  We do also 
tend to tell package authors directly if their packages break, at least if 
they were previously warning-free.

It seems the sort of thing you do is to call methods directly where you 
could equally well call the generic, since that is what is currently 
failing in Design and Hmisc (if survfit.km is a survfit method).
That's not showing up in failures on the tests under R-patched.  Of the
listed dependencies only grid, lattice and survival have namespaces, and
only survival has been added since 1.8.1.  (I suspect the R-patched tests 
are against the 1.8.1 versions of the recommended packages, not the 
current versions.)
I think it does.  If the package writer wants a function to be private,
would-be users should respect that decision.  Most of the cases we have
encountered have been calling methods directly rather than coercing
objects to the right class and calling the generic.  In extremis, copy
(with permission) the function you want from the package sources and
rename it.

Unless I made a mistake there are no current uses of ::: in CRAN packages, 
and there are very few in base R (and quite a lot of the methods::: should 
probably better be methods::).

Brian
#
On Mon, 19 Jan 2004 07:51:34 -0500, Frank E Harrell Jr
<feh3k@spamcop.net> wrote :
Generally when people know something is likely to cause trouble,
there's a posting to this mailing list  ---  but it's easy to overlook
it among all the other traffic.

To make it a bit easier for Windows package developers to test against
r-devel, I'm going to keep a reasonably up-to-date Windows build
online.  For now it's on CRAN, but there are concerns about the impact
on the mirrors of the extra file size and download traffic.  However,
if I have to move it the links on CRAN will be updated, so it's safe
to say you should start looking at
<http://cran.r-project.org/bin/windows/base>.

Duncan Murdoch
#
Duncan Murdoch wrote:

            
Are you sure about that? I can't find old contributed packages, but it 
wouldn't be the first time I've missed something obvious. 
(src/contrib/Old is hardly a complete archive.)

Paul
#

        
PaulG> Duncan Murdoch wrote:
>> but they'll still have access via CRAN to
    >> older versions of your package.

    PaulG> Are you sure about that? I can't find old contributed packages, but it 
    PaulG> wouldn't be the first time I've missed something obvious. 

yes.  You've missed src/contrib/Archive/

    PaulG> (src/contrib/Old is hardly a complete archive.)

Martin
#
On Mon, 19 Jan 2004 14:09:58 +0000 (GMT)
Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:

            
I will start checking NEWS.  The kind of news I need though is more about
bugs that do not cause the package to break.
The point of calling methods directly is efficiency, otherwise I would not
use this dirty practice.  When bootstrapping or otherwise calling methods
repeatedly, I seek the lowest level functions for speed.  This conflicts
with the namespace idea.  I think this should have been taken into
consideration when designing namespaces.
Yes, those are the ones.
Right
That is a possibility.  None of the approaches we've named are without
maintenance problems.

Frank
---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University
#
On Mon, 19 Jan 2004 11:18:47 -0500, Paul Gilbert
<pgilbert@bank-banque-canada.ca> wrote :
Not sure about source versions.  Windows binaries are available from
1.6 on, in <http://www.cran.mirrors.pair.com/bin/windows/contrib>.   

We have talked about when to drop 1.6; I'd like to keep it online for
a year after the next version comes out (which would mean 1.6 goes
away this spring, 1.7 in the fall, etc.).  This means that someone has
at least a year to upgrade their R installation. 

Duncan Murdoch
#
On Mon, 19 Jan 2004, Prof Brian Ripley wrote:

            
Technically survfit.km() isn't a survival method, since it's called
directly by survfit() rather than by UseMethod, but it works like one.
It's an internal function that has never been documented, which is why it
wasn't exported.

If people want it, I can export it.  I hadn't heard from anyone wanting it
exported.


	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle
#
Are you sure there is a measurable difference in calling methods directly?
The dispatch overhead on formula (one of your uses) appears to be about
10 microseconds.  (Note, negligible even for 10,000 bootstraps.)

I believe we took the real performance penalties into account (and 
namespaces had performance pluses as well as minuses).
On Mon, 19 Jan 2004, Frank E Harrell Jr wrote:

            
What is the problem with coercing to the right class and calling the 
generic.
#
On Mon, 19 Jan 2004 17:17:39 +0000 (GMT)
Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:

            
Brian,

I don't worry about dispatch overhead.  I do worry about overhead of
assembling model matrices, removing rows with NAs, etc.  -Frank
#
On Mon, 19 Jan 2004 09:10:36 -0800 (PST)
Thomas Lumley <tlumley@u.washington.edu> wrote:

            
Thomas - I would appreciate getting survfit.km and survreg.fit exported.

Thanks,

Frank
#
On Mon, 19 Jan 2004, Frank E Harrell Jr wrote:

            
Here is what you said:
and your code is failing in R-devel because you are calling
formula.default. So, *why* are you calling formula.default?

Calling e.g. glm.fit not glm is not to do with methods, and apparently
survfit.km is not a method.
#
On Mon, 19 Jan 2004 18:13:42 +0000 (GMT)
Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:

            
That was fixed 16Dec03 for the next version to be submitted to CRAN.  I no
longer call formula.default.
You're right, it just has to do with survfit.km needing to be exported.

Frank
---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University
#
Peter Dalgaard wrote:

            
Being somewhat overwhelmed at the moment, I read this quickly as 
"maintains itself,"  and thought that would be pretty neat.

Back on earth, and speaking with no inside knowledge, I have the 
impression someone has to spend a lot of time removing R bug reports 
that really never should have been admitted into the repository.

Since this is a wish list, it would be nice if there were a system to 
allow reports to sit in a temporary "unconfirmed bug reports" area where 
all users could add comments like "confirmed in R-patched on zzz," 
"works in Linux," "fixed in R-devel," "here is a work around," "here is 
a patch," "this is a FAQ," or "read the documentation." Then some 
volunteer or the package maintainer would occasionally need to sort 
these, but a lot more of the work would be done by the larger R community.

Paul