Proposal: 'global' package refactoring
On Tue, 25 Nov 2003, Prof Brian Ripley wrote:
I am explicitly not prepared to `refactor' MASS. Not only is it explicitly support software for a book (which references it and those references cannot be changed retrospectively), it also represents much work over many years. Not that we get much credit for it, but we do get some and these days that does matter. Parts of MASS have been incorporated into both R and S-PLUS -- perhaps we have already gone too far. Indeed, I have floated the idea of migrating some functionality back, notably that of package lqs (which is part of MASS in the S version).
I think many package authors will find the idea of scattering things they have written into many places unacceptable, both because, as Brian says, it is hard enough to get credit for one's efforts when there is an identifiable unit and because it makes maintenance more difficult. This suggests that there may be several organizations that make sense for different purposes: one for code maintenance and one for use, or maybe more than one for use: geostatistical users may prefer a different organization than bioinformatics users or instructors in elementary data analysis courses. In principle it might be possible to use the name space mechanism to provide different organizational structures: One can create a new package that imports selected variables form a variety of packages and then exports them as the variables of the new package. At present this does not import and re-export documentation, but that could be addressed if this approach seems viable. (This also only works if the original package providing the variables has a name space.) Best, luke
On Mon, 24 Nov 2003, Warnes, Gregory R wrote:
Looking over the contents of various packages, including my own, it is clear
that lots of things end up 'hidden away' in packages where they don't
belong. My gregmisc package is a particularly egregious example, containing
something from almost every functional category.
I propose that from time to time the R community go through the complete set
of packages and 'refactor' the functions and data sets into packages that
have clearly defined goals. This should make it easier to ensure that new
functions get placed into a location where users can easily find them,
reduce the amount of re-implementation/duplication existing functionality,
and assist in ensuring interoperability.
It would be worthwhile, for instance, to pull all of the functions related
to contrasts for generalized linear models into a common location, instead
of having them spread between base, Hmisc, MASS, gregmisc, etc. Similarly,
it would be helpful to pull together all of the genetics-computations into a
single location.
I recognize that not all package maintainers would be willing to participate
and that not all functions could be easily categorized, but I believe that
this effort would yield significant benefit and is compatible with the goal
of R-core to streamline the base packages.
To put my money where my mouth is, I'll volunteer to organize a group effort
to do such a refactoring in conjunction with the userR! 2004 or the next
DSC, whichever folks agree is better for this purpose.
Gregory R. Warnes, Ph.D.
Senior Coordinator
Groton Non-Clinical Statistics
Pfizer Global Research and Development
<<Warnes, Gregory R.vcf>>
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
Luke Tierney University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke@stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu