Skip to content

Projects

12 messages · Douglas Bates, Martin Maechler, Peter Dalgaard +6 more

#
I am teaching a graduate course on Statistical Computing this
semester.  A major part of the grade is determined by a project in
which a student or small group of students produce, test, and document
some software for statistics.  I will encourage those students who are
developing in S to package their software as an R package.

I would welcome suggestions of possible projects, especially projects
that come under the heading of "Useful facilities to be added to R".
Please keep in mind that the project must be completed by mid-December
and that not all the students have extensive experience programming in
S and C.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
1) Design of Experiments :

 - S (even before "-plus" existed) has had functions like

   fac.design
   oa.design
   fractionate   etc  ["White book", chapter 5].

		 [also a set of pre-stored useful fractional designs
		  for (> 2)-leveled factors.]

  Some of these would look like ``student exercises'' to do.
  If you could have them work in an environment where S-plus was *not*
  installed, this would end up as "clean table" project 
  w/o problematic code copying...

 - One could also think of  code for SEQUENTIAL design
  [do a  2^{m-k} (= 8) initially; 
   do a  (fractionated) n1 x .. x n_N on the remaining N important factors,
   given the data for the first 8 experiments].

 - Or "Taguchi" [- using (at least two) different kind of factors, some cheap,
		   some expensive to change
		 - multiple Y's, for some the "local" variance should be
		   minimized, etc etc
		 ]
	      
  This looks tedious and maybe can well be partitioned into different student
  projects.  Maybe JMC, AEF, RMH (authors of ch.5) or other experts can say
  much more here.

------

2) For Computer Scientists :
    "Differentiation" / Symbolic derivatives,..

   Improve the possibilities of  D() and deriv(), and document them.
   Make these user-extensible.

   Think about the hessian in addition to the gradient.

  

--------

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO D10	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
MM> 2) For Computer Scientists : "Differentiation" / Symbolic
    MM> derivatives,..

    MM>    Improve the possibilities of D() and deriv(), and document
    MM> them.  Make these user-extensible.

    MM>    Think about the hessian in addition to the gradient.

Even automating/specifications for numerical derivatives would be
nice.
#
Douglas Bates <bates@stat.wisc.edu> writes:
These are probably too hard and too narrow, but now the topic is up:

- getting predictions to work on new data in cases where model depends
  on data set (notably regressions splines with auto knot placement)

- in lme, we can predict at level K would be nice to get SE of
  prediction (this also takes levels, extending distinction between
  confidence and tolerance intervals) 

- conditional tolerance in lme (much too hard I suspect)

- in model.tables.aov, SE's for type="means" are sorely missed.

This is not very hard, but maybe too small (although one will have to
study issues of contrasts and internals of an lm object rather
carefully):

- extend pairwise.t.test to take a linear model and a factor in the
  model as argument.
#
MM> 2) For Computer Scientists : "Differentiation" / Symbolic
  MM> derivatives,..

  MM> Improve the possibilities of D() and deriv(), and document them.
  MM> Make these user-extensible.

  MM> Think about the hessian in addition to the gradient.

  AJR> Even automating/specifications for numerical derivatives would
  AJR> be nice.

The nls package has a function called numericDeriv - with a little
modification it could be used elsewhere. An earlier version of the
underlying C code is used as an example in the R Extensions manual.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
MM> 2) For Computer Scientists : "Differentiation" / Symbolic
    MM> derivatives,..

    MM> Improve the possibilities of D() and deriv(), and document
    MM> them.  Make these user-extensible.

    MM> Think about the hessian in addition to the gradient.

    AJR> Even automating/specifications for numerical derivatives
    AJR> would be nice.

    SD> The nls package has a function called numericDeriv - with a
    SD> little modification it could be used elsewhere. An earlier
    SD> version of the underlying C code is used as an example in the
    SD> R Extensions manual.

Right, and it would be nice (necessary?) for testing and verification
of the "symbolic" side of the package as well as for "drop-in" or
"change a parameter" design functionality.  As well as being a
starting point for the interface for the symbolicDeriv package.

best,
-tony
#
On 14 Sep 2000 10:43:05 -0700, you wrote:

            
I'm not sure if this is what Tony has in mind, but something I always
do when working with optimizers that take a supplied derivative is to
write a little numerical derivative routine to check that my calculus
and programming were done correctly.  Putting that into a single
function which took arguments just like the ones you pass to the
optimizer would be very convenient.

Duncan Murdoch
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#

        
DM> On 14 Sep 2000 10:43:05 -0700, you wrote:
>> Right, and it would be nice (necessary?) for testing and
    >> verification of the "symbolic" side of the package as well as
    >> for "drop-in" or "change a parameter" design functionality.  As
    >> well as being a starting point for the interface for the
    >> symbolicDeriv package.

    DM> I'm not sure if this is what Tony has in mind, but something I
    DM> always do when working with optimizers that take a supplied
    DM> derivative is to write a little numerical derivative routine
    DM> to check that my calculus and programming were done correctly.
    DM> Putting that into a single function which took arguments just
    DM> like the ones you pass to the optimizer would be very
    DM> convenient.

That's exactly it.  (but perhaps have at least the 3 basic forms --
f(point + eps) - f(point), f(point + eps/2) - f(point - eps/2),
f(point) - f(point - eps)).

best,
-tony
#
On Thu, Sep 14, 2000 at 06:46:41PM +0200, Martin Maechler wrote:
Now that you mention it, I'd like this to work:
expression(x + 1)
expression((x + 1) + 2)
expression( x + 3 )
expression(x^3 + 14 * x^2 + 59 * x + 70)
expression((x + 2) * (x + 5) * (x + 7))


If the roots have small imaginary components due to round off,
that is still OK. If there is a good reason for the operations
to return objects of mode ``call'' rather than mode ``expression''
(like what D() does) that would be OK too, although I prefer
to work with the expression() wrappers on because then the printed
form is identical to what you type in. At any rate, the operations
have to handle objects of mode ``expression'', ``call'' and all the
numericals as their inputs.

And as many others as you get time to do plus a system for
categorising, organising, documenting and extending the available
manipulations...

... I'm sure this is all very useful for statistics, it would
be useful to me at any rate. 

	- Tel

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
DM> Putting that into a single function which took arguments just
    DM> like the ones you pass to the optimizer would be very
    DM> convenient.
Buried somewhere in one of my packages is code for doing Richardson
extrapolation, which usually gives a much better estimate than these 3 basic
forms. If this project goes ahead I will dig it out.

Paul Gilbert


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
(not to the list)
DM> Putting that into a single function which took arguments just
    DM> like the ones you pass to the optimizer would be very
    DM> convenient.

    >> That's exactly it.  (but perhaps have at least the 3 basic
    >> forms -- f(point + eps) - f(point), f(point + eps/2) - f(point
    >> - eps/2), f(point) - f(point - eps)).

    PG> Buried somewhere in one of my packages is code for doing
    PG> Richardson extrapolation, which usually gives a much better
    PG> estimate than these 3 basic forms. If this project goes ahead
    PG> I will dig it out.

That would be cool, and I believe there are other, more precise,
approaches!  BUT, I'm worried that this (_numerical_ derivatives),
might become a package unto itself (which gets amusing, especially if
you consider the many ways one might want to differentiate a
multi-array vector by a vector...).

All of a sudden, I'm starting to see 2 projects out of one, depending
on the scope.
2 days later
#
Hopefully not too late for me to join the wishlist ...

In addition to the useful suggestions re experimental design and by
Peter, I have the following ctest-related projects which I think would
fit very nicely.  Ordering is according to decreasing priority.

* Improved support for exact inference (p-values and, where appropriate,
also confidence intervals ) for some of the tests, in particular for
Kruskal-Wallis and (2-sample) Smirnov.  In addition, we currently don't
have exact p-values in the rank-based tests in case of ties, and one
could deal with this using the Streitberg-Roehmel path suggested by
Torsten Hothorn (see add-on package ExactDistr).  Also, permutation
tests might be useful in some cases ...

[I have an NEW implementation of the Mehta-Patel network algorithm for
dealing with the common odds ratio in 2 x 2 x k tables ready, hence will
take care of mantelhaen.test() myself.]

* Implement alternative definitions of the 2-sided p-value using

	p = 2[f P(X=x) + min{P(X<x),P(X>x)}]

with 0 <= f <= 1 as definition.

* Improve the code for fisher.test(), maybe re-implement from scratch?
The memory management definitely needs to be rewritten for 1.2.

-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._