Prediction/classification & variable selection

Thu, May 14, 2020 1:02 PM

Dear Daniel,
to build upon Cesko's comment, if your problem is indeed a problem of  
"classic" model selection, the package MuMin does this by basically  
testing every combination of predictor variables:

https://cran.r-project.org/web/packages/MuMIn/MuMIn.pdf

It also takes mixed models.

Cheers, Tim


Zitat von "Voeten, C.C." <c.c.voeten at hum.leidenuniv.nl>:

Dear Daniel,

Maybe my understanding of your situation is a bit too simplistic,  
but it sounds like you have a classic case of model selection /  
feature selection? There are many approaches for that. The easiest  
would be likelihood-ratio tests (or AIC, or BIC, or some other  
criterion). Start with a full model (or as full as you can get while  
still achieving convergence) containing all combinations of  
predictors, remove one term, see if the model improves according to  
your criterion... repeat until no terms are left to be eliminated.  
There are many packages that can automate this procedure for you.  
Another option could be lasso or ridge regression, which are  
commonly used for feature selection in the classification  
literature. I don't know if the lasso has been implemented for mixed  
models, but I know that package mgcv allows you to specify ridge  
penalties via (see the documentation related to the paraPen argument).

HTH,
Cesko

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> On
Behalf Of daniel.schlunegger at psy.unibe.ch
Sent: Thursday, May 14, 2020 6:42 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Prediction/classification & variable selection

Dear people of r-sig-mixed-models at r-project.org

My name is Daniel Schlunegger, PhD-student in Psychology at the University
of Bern, Switzerland.

I?m new here and I wondered if somebody can help me.

My goal is to predict subjects responses based on their previous  
responses in
a one-interval two-alternative forced choice auditory discrimination task
(Was it tone A or tone B sort of task). I?ve ran an experiment with 24
subjects, each subject performed 1200 trials ( = 28800 trials). There are no
missing values, all data is ?clean?.

The main idea of my work is:
1) Take subjects? responses
2) Compute some statistics with those responses
3) Use these statistics to predict the next response (in a  
trial-by-trial fashion)

Goal: Prediction / Classification (binary outcome)

From three different learning models I derived three predictors. More
clearly, three different sets of predictors. Within each set, there are n
predictors (normally distributed). The predictors within each set  
are of very
similar nature. I need a model with three predictors, one of each set of
predictors. From each set of predictors, there is one predictor in  
the model:

y ~ predictor1_n + predictor2_n + predictor3_n

Problem: Theoretically it is possible (or rather probable) that for  
each subject
a different combination of predictors (e.g. predictor1_2 + predictor2_1 +
predictor3_3 vs. predictor1_1 + predictor2_2 + predictor3_3) results in a
better classification accuracy. On the other hand I would like to keep the
model as simple as possible. Let?s say, having the same three predictors for
all subjects, while accounting for differences with a random intercept (1 |
subject) or random intercept and random slope.

I?ve seen a lot of work where they perform subject-level and group-level
analyses, but I think that?s actually not correct, right?

Do you have any suggestions how to do this the proper way? I assume that
just running n * n * n different GLMM?s (lme4::glmer()) is not the  
proper way
to do it. Because that is what I did so far, and then checked what  
combination
gives me the best prediction.

(I have another dataset from a slightly different version of the experiment.
This dataset contains 91200 trials from 76 subjects, if number of  
observations
is an issue here)

Thanks for considering my request.

Kind regards,
Daniel Schlunegger

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Tim Richter-Heitmann
Universit?t Bremen

Prediction/classification & variable selection

Thread (3 messages)