[R--gR] SIN model selection - R-SIG-gR

Tue, Aug 17, 2004 3:46 PM #
Dear gR-folk!

As Steffen previously announced, Michael Perlman and I have been doing
some work on model selection in Gaussian graphical models.

In a Gaussian graphical model, the conditional independences that define
the model hold iff partial correlation coefficients are zero. Thus, for
model selection (i.e. determination of a graph from data) we can simply
test, for each possible edge, a hypothesis about a zero partial corr.
coeff.  If the hypothesis is accepted, then we do not include the
associated edge in the graph.  If it is rejected, then we include the
edge.

In a paper appearing in September's issue of Biometrika, see

http://www.stat.washington.edu/drton/Papers/2004sin.pdf

for a preprint, we show, in the setting of undirected graphs, how Fisher's
z-transform and Sidak's inequality can be used to test these multiple
hypotheses simultaneously. Carrying out the tests simultaneously, we can
control the overall error rate of incorrect edge inclusion in our model
selection.

In a follow-up paper, we show how to apply this model selection procedure
(which we named SIN) in the setting of DAGs and chain graphs. This paper
is available as Univ. of Washington Tech. Report 457 at

http://www.stat.washington.edu/www/research/reports/

In particular, we improve our simultaneous testing procedure using a
p-value adjustment due to Holm. The paper also provides a brief overview
of Gaussian graphical models, which might be interesting in its own right.

I have prepared a version 0.1 of an R package that implements SIN model
selection. The package with name 'SIN' can be downloaded from

http://cran.r-project.org/

I am planning on revising the package over the next few weeks as I am not
happy with how I handled the labelling of the variables (be careful with
the chain graph routines).  Suggestions are welcome.

Best wishes,

Mathias (Drton)