I thought about a couple of approaches (see details below) but none seem very
satisfactory. This issue keeps reminding me of things such as the LASSO and
other shrinkage methods, but the twist here is that it is not the beta for a
covariate, but different covars in each subject which are made zero.
Is there any obvious solution I am missing? Any suggestions?
Thanks,
************
Approach 1: the final statistic to judge predictive quality is Goodman &
Kruskal's tau (or concentration coefficient) for IxJ contingency tables.
Since for every subject with m "present" covars, there are m possible
contingency tables, and there are many subjects with multiple present covars,
there is an astronomical number of possible contingency tables, and we can
not do an exahustively search (nor do I see an obvious way to simplify the
problem from tau's definition, because we have 12 categories to predict based
on the 8 covars). I would use a genetic algorithm to try and find a decent
solution.
Approach 2: set this up as a multinomial loglinear model. Fit it (using
multinom) to the original data set. Do not make the covars as factors but
code present as 1 and absent as 0.
For each subject with several (say, k) "present" covars, predict the class
membership (predict.multinom) for each of the k covar. vectors obtained after
subtracting, say, 0.1, from each of the covariates (except 1) with value
non-zero. Set as the new covariate vector for that subject the one that gives
the highest predicted probability to the right class.
Repeat the model fitting and modify covariates as in the last step
(re-escaling at the end, so that the max. covar. value is always one for each
subject) until there is only one non-zero covar. (If there ever is!).
This seems to me like a very clumsy approach, and I am not sure if there is
any reason for it to arrive at a reasonable solution; I thought it could be a
way of smoothly moving, within subject, each covariate (except one) "along
its path of least resistance" to a value of zero.
(Note: in both approaches further simplification can be achieved by applying
the same transformation or mutation ---with ga--- to all subjects that belong
to the same class and have the same initial configuration of covariates. This
way I also forcefully prevent identical subjects to end up with different
final configurations).