Skip to content

Proposal: 'global' package refactoring

6 messages · Warnes, Gregory R, Jan de Leeuw, Paul Murrell +3 more

#
Looking over the contents of various packages, including my own, it is clear
that lots of things end up 'hidden away' in packages where they don't
belong.  My gregmisc package is a particularly egregious example, containing
something from almost every functional category.  

I propose that from time to time the R community go through the complete set
of packages and 'refactor' the functions and data sets into packages that
have clearly defined goals.   This should make it easier to ensure that new
functions get placed into a location where users can easily find them,
reduce the amount of re-implementation/duplication existing functionality,
and assist in ensuring interoperability.

It would be worthwhile, for instance, to pull all of the functions related
to contrasts for generalized linear models into a common location, instead
of having them spread between base, Hmisc, MASS, gregmisc, etc.   Similarly,
it would be helpful to pull together all of the genetics-computations into a
single location.

I recognize that not all package maintainers would be willing to participate
and that not all functions could be easily categorized, but I believe that
this effort would yield significant benefit and is compatible with the goal
of R-core to streamline the base packages. 

To put my money where my mouth is, I'll volunteer to organize a group effort
to do such a refactoring in conjunction with the userR! 2004 or the next
DSC, whichever folks agree is better for this purpose.


Gregory R. Warnes, Ph.D.
Senior Coordinator
Groton Non-Clinical Statistics
Pfizer Global Research and Development
 <<Warnes, Gregory R.vcf>> 


LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
#
This is a good idea, and it would be great to have these
refactored meta packages. But it actually implies having
a group similar to R core that does code review of
existing packages. For example, what happens if
a function seems to work but is programmed horribly
inefficiently ? What happens if something exists on both
the R and C levels ? What happens with packages that
rely on private versions of BLAS ? Suppose two packages
provide the same functionality, how does one choose ?
And can this be done without coding conventions ? Who is
in charge ?
On Nov 24, 2003, at 14:12, Warnes, Gregory R wrote:

            
===
Jan de Leeuw; Professor and Chair, UCLA Department of Statistics;
Editor: Journal of Multivariate Analysis, Journal of Statistical  
Software
US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw@stat.ucla.edu
homepage: http://gifi.stat.ucla.edu
   
------------------------------------------------------------------------ 
-------------------------
           No matter where you go, there you are. --- Buckaroo Banzai
                    http://gifi.stat.ucla.edu/sounds/nomatter.au
#
Hi

I have wanted to figure out a way to do something along these lines for 
the many, widely-scattered plotting functions.  Something that would be 
less invasive (and less useful, but a valid step in the right 
direction), is simply a "directory" for different functional groups.  A 
list of function names, plus descriptions of what they do, plus a 
pointer to the package they are in would I think be really useful.  A 
lot of work to create and maintain, but really useful.  For example, the 
web pages focused on "spatial projects" 
(http://sal.agecon.uiuc.edu/csiss/Rgeo/index.html) has summaries of all 
spatially related packages.
The coordination of the DBMS stuff 
(http://developer.r-project.org/db/index.html) is another example of 
something similar.
Then of course there is the R GUIs pages (http://www.sciviews.org/_rgui/)

Paul
Jan de Leeuw wrote:

  
    
#
I am explicitly not prepared to `refactor' MASS.  Not only is it
explicitly support software for a book (which references it and those
references cannot be changed retrospectively), it also represents much
work over many years.  Not that we get much credit for it, but we do get
some and these days that does matter.

Parts of MASS have been incorporated into both R and S-PLUS -- perhaps we 
have already gone too far.  Indeed, I have floated the idea of migrating 
some functionality back, notably that of package lqs (which is part of 
MASS in the S version).
On Mon, 24 Nov 2003, Warnes, Gregory R wrote:

            

  
    
#
On Mon, 24 Nov 2003 17:12:24 -0500, you wrote:

            
Package 'foreign' is currently such a multi-author common purpose
package.  One of the problems is that there isn't a single maintainer:
users who have problems with any of the functions write to all of the
authors for help.  All of the authors are still active, so this gets a
response, but I can see problems in some other package where an author
moves on and doesn't want to maintain the code.  

If that happens to a package then the package will disappear from
CRAN, once it stops passing tests in new releases.  If it's just a
function or two, what happens when it needs maintenance, or when it
gets orphaned?

Duncan Murdoch
#
On Tue, 25 Nov 2003, Prof Brian Ripley wrote:

            
I think many package authors will find the idea of scattering things
they have written into many places unacceptable, both because, as
Brian says, it is hard enough to get credit for one's efforts when
there is an identifiable unit and because it makes maintenance more
difficult.

This suggests that there may be several organizations that make sense
for different purposes: one for code maintenance and one for use, or
maybe more than one for use: geostatistical users may prefer a
different organization than bioinformatics users or instructors in
elementary data analysis courses.

In principle it might be possible to use the name space mechanism to
provide different organizational structures: One can create a new
package that imports selected variables form a variety of packages and
then exports them as the variables of the new package.  At present
this does not import and re-export documentation, but that could be
addressed if this approach seems viable.  (This also only works if the
original package providing the variables has a name space.)

Best,

luke