Looking over the contents of various packages, including my own, it is clear
that lots of things end up 'hidden away' in packages where they don't
belong. My gregmisc package is a particularly egregious example, containing
something from almost every functional category.
I propose that from time to time the R community go through the complete set
of packages and 'refactor' the functions and data sets into packages that
have clearly defined goals. This should make it easier to ensure that new
functions get placed into a location where users can easily find them,
reduce the amount of re-implementation/duplication existing functionality,
and assist in ensuring interoperability.
It would be worthwhile, for instance, to pull all of the functions related
to contrasts for generalized linear models into a common location, instead
of having them spread between base, Hmisc, MASS, gregmisc, etc. Similarly,
it would be helpful to pull together all of the genetics-computations into a
single location.
I recognize that not all package maintainers would be willing to participate
and that not all functions could be easily categorized, but I believe that
this effort would yield significant benefit and is compatible with the goal
of R-core to streamline the base packages.
To put my money where my mouth is, I'll volunteer to organize a group effort
to do such a refactoring in conjunction with the userR! 2004 or the next
DSC, whichever folks agree is better for this purpose.
Gregory R. Warnes, Ph.D.
Senior Coordinator
Groton Non-Clinical Statistics
Pfizer Global Research and Development
<<Warnes, Gregory R.vcf>>
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
Proposal: 'global' package refactoring
6 messages · Warnes, Gregory R, Jan de Leeuw, Paul Murrell +3 more
This is a good idea, and it would be great to have these refactored meta packages. But it actually implies having a group similar to R core that does code review of existing packages. For example, what happens if a function seems to work but is programmed horribly inefficiently ? What happens if something exists on both the R and C levels ? What happens with packages that rely on private versions of BLAS ? Suppose two packages provide the same functionality, how does one choose ? And can this be done without coding conventions ? Who is in charge ?
On Nov 24, 2003, at 14:12, Warnes, Gregory R wrote:
Looking over the contents of various packages, including my own, it is
clear
that lots of things end up 'hidden away' in packages where they don't
belong. My gregmisc package is a particularly egregious example,
containing
something from almost every functional category.
I propose that from time to time the R community go through the
complete set
of packages and 'refactor' the functions and data sets into packages
that
have clearly defined goals. This should make it easier to ensure
that new
functions get placed into a location where users can easily find them,
reduce the amount of re-implementation/duplication existing
functionality,
and assist in ensuring interoperability.
It would be worthwhile, for instance, to pull all of the functions
related
to contrasts for generalized linear models into a common location,
instead
of having them spread between base, Hmisc, MASS, gregmisc, etc.
Similarly,
it would be helpful to pull together all of the genetics-computations
into a
single location.
I recognize that not all package maintainers would be willing to
participate
and that not all functions could be easily categorized, but I believe
that
this effort would yield significant benefit and is compatible with the
goal
of R-core to streamline the base packages.
To put my money where my mouth is, I'll volunteer to organize a group
effort
to do such a refactoring in conjunction with the userR! 2004 or the
next
DSC, whichever folks agree is better for this purpose.
Gregory R. Warnes, Ph.D.
Senior Coordinator
Groton Non-Clinical Statistics
Pfizer Global Research and Development
<<Warnes, Gregory R.vcf>>
LEGAL NOTICE\ Unless expressly stated otherwise, this
messag...{{dropped}}
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
=== Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; Editor: Journal of Multivariate Analysis, Journal of Statistical Software US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.ucla.edu homepage: http://gifi.stat.ucla.edu ------------------------------------------------------------------------ ------------------------- No matter where you go, there you are. --- Buckaroo Banzai http://gifi.stat.ucla.edu/sounds/nomatter.au
Hi I have wanted to figure out a way to do something along these lines for the many, widely-scattered plotting functions. Something that would be less invasive (and less useful, but a valid step in the right direction), is simply a "directory" for different functional groups. A list of function names, plus descriptions of what they do, plus a pointer to the package they are in would I think be really useful. A lot of work to create and maintain, but really useful. For example, the web pages focused on "spatial projects" (http://sal.agecon.uiuc.edu/csiss/Rgeo/index.html) has summaries of all spatially related packages. The coordination of the DBMS stuff (http://developer.r-project.org/db/index.html) is another example of something similar. Then of course there is the R GUIs pages (http://www.sciviews.org/_rgui/) Paul
Jan de Leeuw wrote:
This is a good idea, and it would be great to have these refactored meta packages. But it actually implies having a group similar to R core that does code review of existing packages. For example, what happens if a function seems to work but is programmed horribly inefficiently ? What happens if something exists on both the R and C levels ? What happens with packages that rely on private versions of BLAS ? Suppose two packages provide the same functionality, how does one choose ? And can this be done without coding conventions ? Who is in charge ? On Nov 24, 2003, at 14:12, Warnes, Gregory R wrote:
Looking over the contents of various packages, including my own, it
is clear
that lots of things end up 'hidden away' in packages where they don't
belong. My gregmisc package is a particularly egregious example,
containing
something from almost every functional category.
I propose that from time to time the R community go through the
complete set
of packages and 'refactor' the functions and data sets into packages
that
have clearly defined goals. This should make it easier to ensure
that new
functions get placed into a location where users can easily find them,
reduce the amount of re-implementation/duplication existing
functionality,
and assist in ensuring interoperability.
It would be worthwhile, for instance, to pull all of the functions
related
to contrasts for generalized linear models into a common location,
instead
of having them spread between base, Hmisc, MASS, gregmisc, etc.
Similarly,
it would be helpful to pull together all of the genetics-computations
into a
single location.
I recognize that not all package maintainers would be willing to
participate
and that not all functions could be easily categorized, but I believe
that
this effort would yield significant benefit and is compatible with
the goal
of R-core to streamline the base packages.
To put my money where my mouth is, I'll volunteer to organize a group
effort
to do such a refactoring in conjunction with the userR! 2004 or the next
DSC, whichever folks agree is better for this purpose.
Gregory R. Warnes, Ph.D.
Senior Coordinator
Groton Non-Clinical Statistics
Pfizer Global Research and Development
<<Warnes, Gregory R.vcf>>
LEGAL NOTICE\ Unless expressly stated otherwise, this
messag...{{dropped}}
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
=== Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; Editor: Journal of Multivariate Analysis, Journal of Statistical Software US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.ucla.edu homepage: http://gifi.stat.ucla.edu ------------------------------------------------------------------------ ------------------------- No matter where you go, there you are. --- Buckaroo Banzai http://gifi.stat.ucla.edu/sounds/nomatter.au
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 paul@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/
I am explicitly not prepared to `refactor' MASS. Not only is it explicitly support software for a book (which references it and those references cannot be changed retrospectively), it also represents much work over many years. Not that we get much credit for it, but we do get some and these days that does matter. Parts of MASS have been incorporated into both R and S-PLUS -- perhaps we have already gone too far. Indeed, I have floated the idea of migrating some functionality back, notably that of package lqs (which is part of MASS in the S version).
On Mon, 24 Nov 2003, Warnes, Gregory R wrote:
Looking over the contents of various packages, including my own, it is clear
that lots of things end up 'hidden away' in packages where they don't
belong. My gregmisc package is a particularly egregious example, containing
something from almost every functional category.
I propose that from time to time the R community go through the complete set
of packages and 'refactor' the functions and data sets into packages that
have clearly defined goals. This should make it easier to ensure that new
functions get placed into a location where users can easily find them,
reduce the amount of re-implementation/duplication existing functionality,
and assist in ensuring interoperability.
It would be worthwhile, for instance, to pull all of the functions related
to contrasts for generalized linear models into a common location, instead
of having them spread between base, Hmisc, MASS, gregmisc, etc. Similarly,
it would be helpful to pull together all of the genetics-computations into a
single location.
I recognize that not all package maintainers would be willing to participate
and that not all functions could be easily categorized, but I believe that
this effort would yield significant benefit and is compatible with the goal
of R-core to streamline the base packages.
To put my money where my mouth is, I'll volunteer to organize a group effort
to do such a refactoring in conjunction with the userR! 2004 or the next
DSC, whichever folks agree is better for this purpose.
Gregory R. Warnes, Ph.D.
Senior Coordinator
Groton Non-Clinical Statistics
Pfizer Global Research and Development
<<Warnes, Gregory R.vcf>>
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Mon, 24 Nov 2003 17:12:24 -0500, you wrote:
I propose that from time to time the R community go through the complete set of packages and 'refactor' the functions and data sets into packages that have clearly defined goals.
Package 'foreign' is currently such a multi-author common purpose package. One of the problems is that there isn't a single maintainer: users who have problems with any of the functions write to all of the authors for help. All of the authors are still active, so this gets a response, but I can see problems in some other package where an author moves on and doesn't want to maintain the code. If that happens to a package then the package will disappear from CRAN, once it stops passing tests in new releases. If it's just a function or two, what happens when it needs maintenance, or when it gets orphaned? Duncan Murdoch
On Tue, 25 Nov 2003, Prof Brian Ripley wrote:
I am explicitly not prepared to `refactor' MASS. Not only is it explicitly support software for a book (which references it and those references cannot be changed retrospectively), it also represents much work over many years. Not that we get much credit for it, but we do get some and these days that does matter. Parts of MASS have been incorporated into both R and S-PLUS -- perhaps we have already gone too far. Indeed, I have floated the idea of migrating some functionality back, notably that of package lqs (which is part of MASS in the S version).
I think many package authors will find the idea of scattering things they have written into many places unacceptable, both because, as Brian says, it is hard enough to get credit for one's efforts when there is an identifiable unit and because it makes maintenance more difficult. This suggests that there may be several organizations that make sense for different purposes: one for code maintenance and one for use, or maybe more than one for use: geostatistical users may prefer a different organization than bioinformatics users or instructors in elementary data analysis courses. In principle it might be possible to use the name space mechanism to provide different organizational structures: One can create a new package that imports selected variables form a variety of packages and then exports them as the variables of the new package. At present this does not import and re-export documentation, but that could be addressed if this approach seems viable. (This also only works if the original package providing the variables has a name space.) Best, luke
On Mon, 24 Nov 2003, Warnes, Gregory R wrote:
Looking over the contents of various packages, including my own, it is clear
that lots of things end up 'hidden away' in packages where they don't
belong. My gregmisc package is a particularly egregious example, containing
something from almost every functional category.
I propose that from time to time the R community go through the complete set
of packages and 'refactor' the functions and data sets into packages that
have clearly defined goals. This should make it easier to ensure that new
functions get placed into a location where users can easily find them,
reduce the amount of re-implementation/duplication existing functionality,
and assist in ensuring interoperability.
It would be worthwhile, for instance, to pull all of the functions related
to contrasts for generalized linear models into a common location, instead
of having them spread between base, Hmisc, MASS, gregmisc, etc. Similarly,
it would be helpful to pull together all of the genetics-computations into a
single location.
I recognize that not all package maintainers would be willing to participate
and that not all functions could be easily categorized, but I believe that
this effort would yield significant benefit and is compatible with the goal
of R-core to streamline the base packages.
To put my money where my mouth is, I'll volunteer to organize a group effort
to do such a refactoring in conjunction with the userR! 2004 or the next
DSC, whichever folks agree is better for this purpose.
Gregory R. Warnes, Ph.D.
Senior Coordinator
Groton Non-Clinical Statistics
Pfizer Global Research and Development
<<Warnes, Gregory R.vcf>>
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
Luke Tierney University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke@stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu