I am teaching a graduate course on Statistical Computing this semester. A major part of the grade is determined by a project in which a student or small group of students produce, test, and document some software for statistics. I will encourage those students who are developing in S to package their software as an R package. I would welcome suggestions of possible projects, especially projects that come under the heading of "Useful facilities to be added to R". Please keep in mind that the project must be completed by mid-December and that not all the students have extensive experience programming in S and C. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Projects
12 messages · Douglas Bates, Martin Maechler, Peter Dalgaard +6 more
1) Design of Experiments :
- S (even before "-plus" existed) has had functions like
fac.design
oa.design
fractionate etc ["White book", chapter 5].
[also a set of pre-stored useful fractional designs
for (> 2)-leveled factors.]
Some of these would look like ``student exercises'' to do.
If you could have them work in an environment where S-plus was *not*
installed, this would end up as "clean table" project
w/o problematic code copying...
- One could also think of code for SEQUENTIAL design
[do a 2^{m-k} (= 8) initially;
do a (fractionated) n1 x .. x n_N on the remaining N important factors,
given the data for the first 8 experiments].
- Or "Taguchi" [- using (at least two) different kind of factors, some cheap,
some expensive to change
- multiple Y's, for some the "local" variance should be
minimized, etc etc
]
This looks tedious and maybe can well be partitioned into different student
projects. Maybe JMC, AEF, RMH (authors of ch.5) or other experts can say
much more here.
------
2) For Computer Scientists :
"Differentiation" / Symbolic derivatives,..
Improve the possibilities of D() and deriv(), and document them.
Make these user-extensible.
Think about the hessian in addition to the gradient.
--------
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO D10 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:
MM> 2) For Computer Scientists : "Differentiation" / Symbolic
MM> derivatives,..
MM> Improve the possibilities of D() and deriv(), and document
MM> them. Make these user-extensible.
MM> Think about the hessian in addition to the gradient.
Even automating/specifications for numerical derivatives would be
nice.
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics BlindGlobe Networks (home/default) rossini@blindglobe.net UW Biostat/Center for AIDS Research rossini@u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini@scharp.org FHCRC: M/Tu: 206-667-7025 (fax=4812) | Voicemail is pretty sketchy CFAR: W/F: 206-731-3647 (fax=3694) | Email is far better than phone UW: Th/F: 206-543-1044 (fax=3286) | Change last 4 digits of phone for fax -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Douglas Bates <bates@stat.wisc.edu> writes:
I am teaching a graduate course on Statistical Computing this semester. A major part of the grade is determined by a project in which a student or small group of students produce, test, and document some software for statistics. I will encourage those students who are developing in S to package their software as an R package. I would welcome suggestions of possible projects, especially projects that come under the heading of "Useful facilities to be added to R". Please keep in mind that the project must be completed by mid-December and that not all the students have extensive experience programming in S and C.
These are probably too hard and too narrow, but now the topic is up: - getting predictions to work on new data in cases where model depends on data set (notably regressions splines with auto knot placement) - in lme, we can predict at level K would be nice to get SE of prediction (this also takes levels, extending distinction between confidence and tolerance intervals) - conditional tolerance in lme (much too hard I suspect) - in model.tables.aov, SE's for type="means" are sorely missed. This is not very hard, but maybe too small (although one will have to study issues of contrasts and internals of an lm object rather carefully): - extend pairwise.t.test to take a linear model and a factor in the model as argument.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"AJR" == A J Rossini <rossini@blindglobe.net> writes:
"MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:
MM> 2) For Computer Scientists : "Differentiation" / Symbolic MM> derivatives,.. MM> Improve the possibilities of D() and deriv(), and document them. MM> Make these user-extensible. MM> Think about the hessian in addition to the gradient. AJR> Even automating/specifications for numerical derivatives would AJR> be nice. The nls package has a function called numericDeriv - with a little modification it could be used elsewhere. An earlier version of the underlying C code is used as an example in the R Extensions manual. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"SD" == Saikat DebRoy <saikat@stat.wisc.edu> writes:
"AJR" == A J Rossini <rossini@blindglobe.net> writes: "MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:
MM> 2) For Computer Scientists : "Differentiation" / Symbolic
MM> derivatives,..
MM> Improve the possibilities of D() and deriv(), and document
MM> them. Make these user-extensible.
MM> Think about the hessian in addition to the gradient.
AJR> Even automating/specifications for numerical derivatives
AJR> would be nice.
SD> The nls package has a function called numericDeriv - with a
SD> little modification it could be used elsewhere. An earlier
SD> version of the underlying C code is used as an example in the
SD> R Extensions manual.
Right, and it would be nice (necessary?) for testing and verification
of the "symbolic" side of the package as well as for "drop-in" or
"change a parameter" design functionality. As well as being a
starting point for the interface for the symbolicDeriv package.
best,
-tony
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics BlindGlobe Networks (home/default) rossini@blindglobe.net UW Biostat/Center for AIDS Research rossini@u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini@scharp.org FHCRC: M/Tu: 206-667-7025 (fax=4812) | Voicemail is pretty sketchy CFAR: W/F: 206-731-3647 (fax=3694) | Email is far better than phone UW: Th/F: 206-543-1044 (fax=3286) | Change last 4 digits of phone for fax -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 14 Sep 2000 10:43:05 -0700, you wrote:
Right, and it would be nice (necessary?) for testing and verification of the "symbolic" side of the package as well as for "drop-in" or "change a parameter" design functionality. As well as being a starting point for the interface for the symbolicDeriv package.
I'm not sure if this is what Tony has in mind, but something I always do when working with optimizers that take a supplied derivative is to write a little numerical derivative routine to check that my calculus and programming were done correctly. Putting that into a single function which took arguments just like the ones you pass to the optimizer would be very convenient. Duncan Murdoch -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"DM" == Duncan Murdoch <dmurdoch@pair.com> writes:
DM> On 14 Sep 2000 10:43:05 -0700, you wrote:
>> Right, and it would be nice (necessary?) for testing and
>> verification of the "symbolic" side of the package as well as
>> for "drop-in" or "change a parameter" design functionality. As
>> well as being a starting point for the interface for the
>> symbolicDeriv package.
DM> I'm not sure if this is what Tony has in mind, but something I
DM> always do when working with optimizers that take a supplied
DM> derivative is to write a little numerical derivative routine
DM> to check that my calculus and programming were done correctly.
DM> Putting that into a single function which took arguments just
DM> like the ones you pass to the optimizer would be very
DM> convenient.
That's exactly it. (but perhaps have at least the 3 basic forms --
f(point + eps) - f(point), f(point + eps/2) - f(point - eps/2),
f(point) - f(point - eps)).
best,
-tony
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics BlindGlobe Networks (home/default) rossini@blindglobe.net UW Biostat/Center for AIDS Research rossini@u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini@scharp.org FHCRC: M/Tu: 206-667-7025 (fax=4812) | Voicemail is pretty sketchy CFAR: W/F: 206-731-3647 (fax=3694) | Email is far better than phone UW: Th/F: 206-543-1044 (fax=3286) | Change last 4 digits of phone for fax -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, Sep 14, 2000 at 06:46:41PM +0200, Martin Maechler wrote:
2) For Computer Scientists :
"Differentiation" / Symbolic derivatives,..
Improve the possibilities of D() and deriv(), and document them.
Make these user-extensible.
Think about the hessian in addition to the gradient.
Now that you mention it, I'd like this to work:
X <- expression( x ) X + 1
expression(x + 1)
(X + 1) + 2
expression((x + 1) + 2)
simplify(( X + 1 ) + 2 )
expression( x + 3 )
simplify( expand(( X + 2 ) * ( X + 5 ) * ( X + 7 )))
expression(x^3 + 14 * x^2 + 59 * x + 70)
factorise( X^3 + 14 * X^2 + 59 * X + 70 ))
expression((x + 2) * (x + 5) * (x + 7)) If the roots have small imaginary components due to round off, that is still OK. If there is a good reason for the operations to return objects of mode ``call'' rather than mode ``expression'' (like what D() does) that would be OK too, although I prefer to work with the expression() wrappers on because then the printed form is identical to what you type in. At any rate, the operations have to handle objects of mode ``expression'', ``call'' and all the numericals as their inputs. And as many others as you get time to do plus a system for categorising, organising, documenting and extending the available manipulations... ... I'm sure this is all very useful for statistics, it would be useful to me at any rate. - Tel -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
DM> Putting that into a single function which took arguments just
DM> like the ones you pass to the optimizer would be very
DM> convenient.
That's exactly it. (but perhaps have at least the 3 basic forms -- f(point + eps) - f(point), f(point + eps/2) - f(point - eps/2), f(point) - f(point - eps)).
Buried somewhere in one of my packages is code for doing Richardson extrapolation, which usually gives a much better estimate than these 3 basic forms. If this project goes ahead I will dig it out. Paul Gilbert -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
(not to the list)
"PG" == Paul Gilbert <pgilbert@bank-banque-canada.ca> writes:
DM> Putting that into a single function which took arguments just
DM> like the ones you pass to the optimizer would be very
DM> convenient.
>> That's exactly it. (but perhaps have at least the 3 basic
>> forms -- f(point + eps) - f(point), f(point + eps/2) - f(point
>> - eps/2), f(point) - f(point - eps)).
PG> Buried somewhere in one of my packages is code for doing
PG> Richardson extrapolation, which usually gives a much better
PG> estimate than these 3 basic forms. If this project goes ahead
PG> I will dig it out.
That would be cool, and I believe there are other, more precise,
approaches! BUT, I'm worried that this (_numerical_ derivatives),
might become a package unto itself (which gets amusing, especially if
you consider the many ways one might want to differentiate a
multi-array vector by a vector...).
All of a sudden, I'm starting to see 2 projects out of one, depending
on the scope.
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics BlindGlobe Networks (home/default) rossini@blindglobe.net UW Biostat/Center for AIDS Research rossini@u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini@scharp.org FHCRC: M/Tu: 206-667-7025 (fax=4812) | Voicemail is pretty sketchy CFAR: W/F: 206-731-3647 (fax=3694) | Email is far better than phone UW: Th/F: 206-543-1044 (fax=3286) | Change last 4 digits of phone for fax -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
2 days later
Douglas Bates writes:
I am teaching a graduate course on Statistical Computing this semester. A major part of the grade is determined by a project in which a student or small group of students produce, test, and document some software for statistics. I will encourage those students who are developing in S to package their software as an R package.
I would welcome suggestions of possible projects, especially projects that come under the heading of "Useful facilities to be added to R". Please keep in mind that the project must be completed by mid-December and that not all the students have extensive experience programming in S and C.
Hopefully not too late for me to join the wishlist ...
In addition to the useful suggestions re experimental design and by
Peter, I have the following ctest-related projects which I think would
fit very nicely. Ordering is according to decreasing priority.
* Improved support for exact inference (p-values and, where appropriate,
also confidence intervals ) for some of the tests, in particular for
Kruskal-Wallis and (2-sample) Smirnov. In addition, we currently don't
have exact p-values in the rank-based tests in case of ties, and one
could deal with this using the Streitberg-Roehmel path suggested by
Torsten Hothorn (see add-on package ExactDistr). Also, permutation
tests might be useful in some cases ...
[I have an NEW implementation of the Mehta-Patel network algorithm for
dealing with the common odds ratio in 2 x 2 x k tables ready, hence will
take care of mantelhaen.test() myself.]
* Implement alternative definitions of the 2-sided p-value using
p = 2[f P(X=x) + min{P(X<x),P(X>x)}]
with 0 <= f <= 1 as definition.
* Improve the code for fisher.test(), maybe re-implement from scratch?
The memory management definitely needs to be rewritten for 1.2.
-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._