Multiple imputations : wicked dataset. Need advice for follow-up to a possible solution.

Emmanuel,

Friedman's (Annals of Stats 1991) MARS program implements recursive
partitioning in a regression context - a version of it written by Trevor
Hastie was available in R but I don't know what package it's now in - I
only have base stuff available (long story).

MARS, like recursive partitioning is a data exploration tool that builds up
an approximation to a nonlinear regression function using piecewise
regression splines. Each splines is split and replaced by a pair and the
GCV score computed - if the split reduces the GCV then the split is
accepted - in this way the method is adaptive.

MARS is very powerful and was used for  time  series research by LEwis &
Bonnie Ray (JASA 1991) - Bonnie has later papers as well. The main flaw
with MARS and I suppose a key reason why it doesn't feature more is that
there is no physical/biological underlying model that the researcher in
trying to make sense of - MARS just finds the best curve. Interpretation of
the result can therefore be a problem. However, MARS does provide an
"anova" type decomposition of the curve and this can certainly help in
making sense of the underlying relationships.

To use it (or related methods such as Generalised Additive Models GAMs) for
imputation then is a question of taste. If you're happy that the regression
curve is sufficient explanation then MARS is worth looking at - if you want
to know more about the physical model, well ...
Finally, MARS will treat all missing data as missing at random so if there
are specific conditional effects there have to be included as categorical
predictors a priori. As MARS is based on least squares it's only optimal
for Gaussian errors - it can be used on categorical data as well - another
variation called PolyMARS also implements MARS for categorical/multinomial
data.

Hope this is of interest!

Gerard

             Emmanuel                                                      
             Charpentier                                                   
             <charpent at bacbuc.                                          To 
             dyndns.org>               r-help at stat.math.ethz.ch            
             Sent by:                                                   cc 
             r-help-bounces at r-                                             
             project.org                                           Subject 
                                       Re: [R] Multiple imputations :      
                                       wicked dataset. Need advice for     
             27/04/2009 20:49          follow-up to a possible solution.   

Answering to myself (for future archive users' sake), more to come
(soon) :

Le jeudi 23 avril 2009 ? 00:31 +0200, Emmanuel Charpentier a ?crit :

Multiple imputations : wicked dataset. Need advice for follow-up to a possible solution.

Thread (3 messages)