Skip to content

Hierarchical Psychometric Function in BRMS

3 messages · René, Ades, James

#
Hey James,

I think the remaining questions are:
1) (Why) should one use RW (continuous) instead of RESPONSE (binary).
2) Is gamma (a guessing parameter) necessary for this.
(and a clarification below)

1)
if twenty kids run a mile, they will all have different times. Students
should be able to get 70% correct (the tasks are not inherently difficult),
it?s a question of what amount of time (how slow) is necessary in order for
them to achieve that 70% correct.
The case you have is:
Your kids run a mile and somebody says "pass" or "fail", because they
either did it in time or not, respectively. If they did it ("pass") you say
"Well that was obviously too easy for you, I want to find out if you can
also do it, if I raise the criterion by 10ms". And if the kids "fail" you
say "Well this was obviously too difficult for you, here is a little bit
more time (40ms), let's see whether you pass now."  Now : if fail lowers RW
by 40ms, and pass raises it by 10ms, then one has 4 remaining steps to
reach the level again at which one fails again. Meaning 4 right 1 wrong, 4
right 1 wrong, 4 right 1 wrong.... Just to be most clear: If my "ability"
let's me pass at criterion of 500ms, but not further, then when I reach
490ms, I will start failing. Let's just play it through: 500->pass;
490->fail; 530->pass; 520->pass; 510->pass; 500->pass; 490->fail; 530->pass
(the circle continues)... and so on- in the long run this means you
approach 4 passes, 1 fail, 4 passes, 1 fail ... which means the ratio of
accuracy will be -- for every participant --  4/5 = 80%. Now, the  average
time window for these 5-trial-circles will be (490+500+510+520+530) / 5 =
510; this means 510ms corresponds to 80% accuracy for this participant, and
thus indicates his/her ability to reach 80% accuracy. Of course, the
participants will differ in their abilities, - another participant has the
maximum ability to pass the criterion of only 600ms; thus 600->pass;
590->fail; 630->pass; 620.... thus,  610ms (on average) corresponds to 80%
accuracy. You see where this is going? Due to the test-procedure, every
participant ,meanders around 80% probably hardly reaching it (due to
behavioral noise). But the idea is - everybody is at 80%. And RW -
directly- tells you the ability of each participants, and this is extremely
nice (!) because you do not need to infer the participants ability
statistically anymore - you precisely measured it. I think, there is no
point in plugging a psychometric function in now. It adds no information.

So if you would simply change your question from "What is the RW threshold
to reach 70%" to "What is the RW threshold to reach 80% accuracy" then you
already have your answer: It is the final average response window of each
participant (due to the staircase procedure). (But one can still see
whether this RW varies between conditions) -- So I would suggest to change
the question, unless there is something very specific about 70%. But as you
noted for yourself, that you initially started of with 80% ... well you
might just rely on your test-procedure (staircase), which I think nobody
will argue about is valid.

2) Since the procedural design basically forbids guessing, there is no way
of "identifying" guessing parameters in further analyses.

Remaining note on my previous point 4) - I was referring to the four
-time-points-  (sessions) not RW, which might resolve the question.

Best
Ren?





Am Mi., 18. M?rz 2020 um 04:59 Uhr schrieb Ades, James <
jades at health.ucsd.edu>:

  
  
#
Hi Rene,

Yes, in an ideal world each participant would end up at 80% threshold. The reason I lowered it to 70% was because it was clear that many participants did not achieve that threshold. Why a good deal of students didn't achieve that is something for the methods section. Taking the final rw would be one way of doing it (as would average RW, which we also look at), but I think since a psychometric function takes into account the entire sampled RW distribution for each participant, it provides a more principled way of looking at a participant response.

I don't necessarily think gamma is essential (a participant would have 50% of getting a trial correct), but from everything I've read, people generally include it as a parameter. How a hierarchical model might change that, I'm not sure.

I do look at other performance methods in the paper, but one of them is psychometric function, so I'm really just trying to figure out how to change my psychometric model to be accurate within a hierarchical, Bayesian framework.

Thanks,

James
#
Hey James,

please don't get me wrong. I am just saying: What you try to find out
psychologically does not require a psychometric function, and doing so adds
unnecessary complexity which is usually not preferred. But, of course, if
your general goal is to understand how to implement a logistic non-linear
model, you can just go ahead. But from everything you have said so far, I
would "predict" the model you have in mind will definitely not converge
(because it is not identifiable), and if so, it will be uninformative. (So,
for practicing modeling it would be good to have a nicer example. For
instance, every participant gets the same stimuli and the same RW's and
then they vary in there Accuracy and this variation in accuracy between
items and participants allows you to estimate item difficulty and
participant ability on a latent scale). -- In your paradigm both,
variations in item difficulty and participant accuracy are simply
eliminated by the procedure (staircase; i.e., no estimation of psychometric
functions based on those assumptions possible).

A simplified graphical illustration:

This is how the participants' behavior should be distributed over the
ongoing trials (simplified).

p(accurate)
100%
^
|----
|
|     --               ----------------------------------------   (about)
80%
|             -  ----
|  --  --   - --
|
| -       ..
0-------------------------------------------------------> ongoing trials
(starting at the beginning

In words: Due to the variations between the participants abilities, given
they all have the same -initial - RW, there is some variance in accuracy
-in the beginning of trials-, but this variance disappears over time due to
the staircase procedure;  eventually, all participants will reach the same
accuracy ceiling of 80% (4pass1fail4pass1fail...). From the moment that the
ceiling is reached, there is basically only noise in the data (random
errors), which means -- for a psychometric function -- you can throw away
these trials.  Not doing so would mean you try to fit noise based on
"norms", which is overfitting (by definition). The psychometric function,
as you want to implement it, would require a continuous relation between RW
and accuracy. This, however, is only true in the very first trials, due to
the staircase procedure. And without between participant variance in
accuracy, there is no way to estimate differences in ability based on
(constant) accuracy.
But as already outlined, the method (staircase) systematically gives you -
the ability equivalent - as  the variance in the accuracy is systematically
eliminated based on RW. This means, you "transferred" the variance you are
actually interested in from "response" to "RW" by methodological means.
Hence, your DV should be RW.


There is really nothing I can say more to this case :) except, : using
functions just because others do might not be the best way to justify
analyses. One very common example what -A LOT- of researchers are doing is
to calculate classic ANOVAs on dependent variables of "percent correct" (or
"percent Response X"). Thus, although this is a binomial, a lot of
researchers use parametric tests on averaged accuracy (or choices). Even
worse: What you can see in a lot of studies on the IOWA gambling task is,
that not simply p(response B) is taken to indicate the DV, but p(B) minus
p(A), with the (pseudo) argument that this reflects some
"action-direction-effect" or similar things (like: "we predicted that B
should be chosen more often, hence we expect the p(B) - p(A) to be
positive). Indeed, by looking at the literature  one could say this is the
canonical way of doing so... However, it is also "very problematic"
because, in such studies, p(A) and p(B) are mutually exclusive, such that
p(B) + p(A) = 1, ( but doing it correctly, i.e., testing p(a) against p=.5,
would unfortunately reduce the effect size from 14% difference to 7%
deviation from chance... and that is - an argument, I guess). So the
general message is :  Trust nobody but your own sanity. :)

(Unfortunately, I will not be able to continue this thread. )

Best
Ren?


Am Mi., 18. M?rz 2020 um 19:33 Uhr schrieb Ades, James <
jades at health.ucsd.edu>: