Skip to content
Prev 11630 / 20628 Next

Modeling precision and recall with GLMMs

Hi Jake and Daniel,

I was extremely obtuse in my answer to your reply: what you and Daniel
suggest certainly makes a lot of sense.
On Thu, 01-01-1970, at 01:00, Jake Westfall <jake987722 at hotmail.com> wrote:
Yes, I see it now.
I agree.
This seems reasonable, in particular since the base rates of the two
classes can be very different between datasets.  But datasets affect also
the "quality" of the signal, the d' in SDT.
I also do not find having different response biases by datasets intuitive.

However, placing truth as the dependent does not seem intuitive to me. In
the first model we have P(1|Signal) or P(1|Noise) but reverting that is
awkward to me; then, I am not sure what a "residual" would mean, and I am
not sure if the coefficients retain the same meaning (intercept capturing
response bias by algorithm, or easy mapping to recall and precision, or
other features explained in, say, "Signal detection theory and
generalized linear models", by DeCarlo ---I've googled a little bit since
Daniel's and your last email :-).


Regardless, this is certainly a really nice way to approach the problem I
originally posted. Moreover, I could easily add edge-specific covariates
that could be related to how hard it is to correctly inferring those (i.e.,
for how small d' is); this would be really neat.


I am intrigued because the literature I am familiar with that compares the
performance of these types of algorithms (or classification algorithms in
general) often uses ranking based on metrics such as recall, precision,
area under the ROC curve, etc, without directly attempting to model the
original responses. So I am not sure if I am not missing something
obvious. 


A more general concern I have (which might explain the previous paragraph)
is that I am not sure if SDT (or what I've been able to speed read about
SDT in the last couple of hours :-) is a good model for the problem. In
particular, even if for each data set and algorithm we have hits, misses,
false alarms, etc, the yes or no decisions are not individual
decisions on each single edge of the network, but rather based on, e.g.,
minimizing some error function over all edges for a given dataset.


Thanks again for your detailed explanation.


Best,


R.