Dear gR-folks The Danish gR-gang have been talking about describing a model language for graphical models that 1) could specify at least chain graph models, based on the most general hierarchical mixed models as described in Lauritzen (1996) [my book], section 6.4, pages 199-216. (More general than MIM-models). 2) did not confuse people who were accustomed to glim-type notation and formulae 3) did not conflict too much with existing formula conventions (MIM, ggm) 4) was clear and unambiguous, and immediately understandable without too much explanation 5) did not conflict too much with the whole idea and setup of graphical interaction models 6) accomodates idea of multiple response variables Here is a first attempt. It may well work, but I would appreciate having response back if I have overlooked some nasty conflicts or bad sides to this. The whole issue is somewhat plagued by the "coincidental" fact that *intrinsically multivariate* log-linear models via "the Poisson trick" can be described through univariate response models for the counts. Below I will first describe the basic general setup, then some conventions which enable people to use alternative, more traditional approaches, without ambiguity. What do you all think of this? Please reply to the entire list...;-) If it works, the suggestion would be for gRbase to adopt it and abandon MIM-notation alltogether, as the latter is slightly different in style. Hopefully it can also be extended to cover BUGS-type models without too many direct conflicts. Best regards Steffen -- Steffen L. Lauritzen Department of Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, United Kingdom Tel: +44 1865 272877; Fax: +44 1865 272595 email: steffen at stats.ox.ac.uk URL: www.stats.ox.ac.uk/~steffen/ --------- The following signs are (at least) permissible: ~, + , * , : , ^ , . and | ~ indicates the beginning of a formula. Implicitly think of log f ~ .... | denotes parenthood in graph, equiv to normalising/conditioning + denotes multiplicative combination (log-additive). Chain components must be contained within parentheses. * or : denotes (tensor)product of interaction terms, decomposed into terms of lower order or not, i.e. A*B*C specifies all subsets of ABC, whereas A:B:C only uses ABC. strength of bindings (*,:) > + > | examples of legal formulae (same model with three chain components specified) m <- gm( log f ~ (A:B+C:D|D)+(B*E|E)+(D*E|E)) m <- gm( ~ (A:B+C*D|D), ~(B*E|E)+(D*E|E)) hierarchical models, as in CoCoCg and Lauritzen (1996)cf p. 213 ~ A+B:X+B*Y+A*B*X^2+A*X:Y+Y^2 not a mim-model ~ A+B:X+A*Y+A*(X+Y)^2 = mim(A+B/AX+BY/AXY) some different models m1<- gm(~A*B+C*D|B*D) equiv gm(~A*B+C*D+B*D|B*D) m2<-gm(~((B+D)*E)|E) m<-b(m1,m2) m <- gm( ~ (A*B)+(C*D|D)+(B*E+D*E|E)) m<- gm( ~ (A*B)+(C|D)+(B+D|E)) CONVENTION for compatibility with standard regression and ggm: Y~X+U:A is the same as ~(Y:X+Y:U:A |XUA) = ~(Y:(X+U:A) |XUA), that is: *If * there is a variable on the left hand side of ~, this is a response to the variables on the right hand side, and the interaction structure is the product of right and left hand sides. Work still needs to be done to identify when models are legal, the same, and parse them for proper and correct analysis. Is this the way ahead?
[R--gR] Modelformulae
3 messages · Poul Svante Eriksen, Steffen Lauritzen
A few comments - just to clear my mind.
Steffen Lauritzen wrote:
Dear gR-folks The Danish gR-gang have been talking about describing a model language for graphical models that 1) could specify at least chain graph models, based on the most general hierarchical mixed models as described in Lauritzen (1996) [my book], section 6.4, pages 199-216. (More general than MIM-models).
Are we limiting the discussion to the Conditionally Gaussian(CG) case with independent and identically distributed observations? That would make life much more easy and clearly rule out X*Y*Z between continuous variables. And I have no problem with interpreting "~" as "log f ~" as long as we are careful to include quadratic terms of continuos variables. But it also rules out things like ~Z+log(X) and is far from having the same flexibility as the BUGS syntax. We might allow ~Z*(Z+log(X))|X and thereby leaving CG, but sticking to gaussian error. If we want a more general BUGS/glim-like syntax, we need to consider, how to include information on link and errordistribution. And it would be nice to facilitate correlation across unit, e.g. by allowing ~X*B(X)|B(X), where B is the backshift operator. Regards, svante
---------------------------------------------------- Poul Svante Eriksen, office G1-113 Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej 7G, DK-9220 Aalborg East, Denmark tel.: (+45) 96358868
In a slightly longer perspective, I think the "CoCoCg"-models are far from being sufficient, so the language should preferably be extendable to express many other models, and their combinations, to achieve a long-term goal of "Programming with models". This may not be easy to do in a single step, but there is no reason to exclude anything from the language which can be given a clear and unambiguous meaning. If terms are included like log(X) or X*Y*Z, the CoCoCg-parser can just say that this model is not available as a CoCoCg-model. But some crisp conventions should be formulated in any case. Steffen ----- Original Message ----- From: "Poul Svante Eriksen" <svante at math.aau.dk> To: "Steffen Lauritzen" <steffen at stats.ox.ac.uk> Cc: "gRlist" <R-sig-gR at stat.math.ethz.ch> Sent: Friday, August 20, 2004 8:33 AM Subject: Re: [R--gR] Modelformulae
A few comments - just to clear my mind. Steffen Lauritzen wrote:
Dear gR-folks The Danish gR-gang have been talking about describing a model language
for
graphical models that 1) could specify at least chain graph models, based on the most general hierarchical mixed models as described in Lauritzen (1996) [my book], section 6.4, pages 199-216.
(More
general than MIM-models).
Are we limiting the discussion to the Conditionally Gaussian(CG) case with independent and identically distributed observations? That would make life much more easy and clearly rule out X*Y*Z between continuous variables. And I have no problem with interpreting "~" as "log f ~" as long as we are careful to include quadratic terms of continuos variables. But it also rules out things like ~Z+log(X) and is far from having the same flexibility as the BUGS syntax. We might allow ~Z*(Z+log(X))|X and thereby leaving CG, but sticking to gaussian error. If we want a more general BUGS/glim-like syntax, we need to consider, how to include information on link and errordistribution. And it would be nice to facilitate correlation across unit, e.g. by allowing ~X*B(X)|B(X), where B is the backshift operator. Regards, svante -- ---------------------------------------------------- Poul Svante Eriksen, office G1-113 Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej 7G, DK-9220 Aalborg East, Denmark tel.: (+45) 96358868 ----------------------------------------------------