Modeling precision and recall with GLMMs

Hi Jake and Daniel,

I was extremely obtuse in my answer to your reply: what you and Daniel
suggest certainly makes a lot of sense.
Hi Ramon,

I am not sure that would work. For each data set, each method returns a
bunch of "P"s and "N"s. But what I want to do is model not the
relationship between truth and prediction, but rather how good or bad
each method is (at trying to reconstruct the truth).
I'm not sure I see the problem. Surely the question of "how good or bad"
each method is is answered by examining which method leads to the
strongest correspondence between truth and prediction. That is the idea
behind what I suggest.

Yes, I see it now.
As for the fact that each method returns many data points, again I do not
see the problem. You are using a multilevel model after all, right? So it
seems to me that within that framework, you have classification decisions
(the data) nested in algorithms, which are crossed with datasets. You
could in principle use a crossed random effects model, but I think it
would make more sense to treat algorithms as fixed.

I agree.
Here's an example of what this might look like. The outcome variable is
"decision" (numeric: 0 or 1), the predictors are "truth" (numeric: -1 or
1) and "algorithm" (factor denoting the algorithm). The model could look
like: glmer(decision ~ 0 + algorithm/truth + (1|dataset))
This syntax in the fixed effects estimates separate intercepts and slopes
for each algorithm. The intercepts get at response bias while the slopes
get at accuracy. As noted previously, these two estimates can be
transformed to precision and recall.

You could also reverse decision and truth so that we have:
glmer(truth ~ 0 + algorithm/decision + (1|dataset))
This might make more sense given the random effects for datasets, which
in this second case allow for different datasets having different base
rates of the two classes.
This seems reasonable, in particular since the base rates of the two
classes can be very different between datasets.  But datasets affect also
the "quality" of the signal, the d' in SDT.
In the former case the random intercepts allowed for different datasets
to lead to different rates of response bias, which is not crazy but isn't
as intuitive to me as the second interpretation.
I also do not find having different response biases by datasets intuitive.

However, placing truth as the dependent does not seem intuitive to me. In
the first model we have P(1|Signal) or P(1|Noise) but reverting that is
awkward to me; then, I am not sure what a "residual" would mean, and I am
not sure if the coefficients retain the same meaning (intercept capturing
response bias by algorithm, or easy mapping to recall and precision, or
other features explained in, say, "Signal detection theory and
generalized linear models", by DeCarlo ---I've googled a little bit since
Daniel's and your last email :-).

Regardless, this is certainly a really nice way to approach the problem I
originally posted. Moreover, I could easily add edge-specific covariates
that could be related to how hard it is to correctly inferring those (i.e.,
for how small d' is); this would be really neat.

I am intrigued because the literature I am familiar with that compares the
performance of these types of algorithms (or classification algorithms in
general) often uses ranking based on metrics such as recall, precision,
area under the ROC curve, etc, without directly attempting to model the
original responses. So I am not sure if I am not missing something
obvious. 

A more general concern I have (which might explain the previous paragraph)
is that I am not sure if SDT (or what I've been able to speed read about
SDT in the last couple of hours :-) is a good model for the problem. In
particular, even if for each data set and algorithm we have hits, misses,
false alarms, etc, the yes or no decisions are not individual
decisions on each single edge of the network, but rather based on, e.g.,
minimizing some error function over all edges for a given dataset.

Thanks again for your detailed explanation.

Best,

R.
Let me know if this makes some sense.

Jake

From: Daniel.Wright at act.org
To: rdiaz02 at gmail.com; jake987722 at hotmail.com
CC: r-sig-mixed-models at r-project.org
Subject: RE: [R-sig-ME] Modeling precision and recall with GLMMs
Date: Wed, 12 Mar 2014 14:26:28 +0000

The "how good or bad" each method is, is what will come out of the method Jake is suggesting.

Using multilevel models for these is common in the memory recognition literature in psychology for the last decade or so, but is also relevant in lots of other areas like medical diagnostics. If the variable IS_ij is whether person i saw stimulus j (0 not seen, 1 seen), and SAY_ij is whether the person says she saw the stimuli, then a multilevel probit or logit regression, with careful coding of the variables, can mimic the standard SDT models. The critical variable for saying if people are accurate is the coefficient in front of SAY. If you have different conditions, COND_j, then interactions between COND_j (or COND_ij if varied within subject) and SAY_ij examine if accuracy varies among these. An important plus of the multilevel models is the coefficients can vary by person and/or stimuli. 

Hi Ramon,

I'm not sure that I fully understand the details of what you want to 
accomplish. But I do want to ask: you jump right into your email 
assuming that of course you want to model precision and recall, but 
what about modelling the data directly (i.e., individual 
classification
decisions) rather than summaries of the data? Then you could work 
backward (forward?) from the model results to compute what the implied 
precision and recall would be
Sorry I did not provide enough details. I am comparing some methods for reconstructing networks, and the True positives and False positives, for instance, refer to the number of correctly inferred edges and to the number of edges that a procedure recovers that are not in the original network, respectively.

So the network reconstruction methods model the data directly, and what I want to model is how good or bad are what they return as a function of several other variables (related to several dimensions of the toughness of the problem, etc)

If you decided that modelling the data directly would work for your 
purposes, then one way of doing this would be to regress 
classification decisions ("P" or "N") on actual classifications ("P" or "N").
I am not sure that would work. For each data set, each method returns a bunch of "P"s and "N"s. But what I want to do is model not the relationship between truth and prediction, but rather how good or bad each method is (at trying to reconstruct the truth).

If this is done in a probit model, it is equivalent to the 
equal-variance signal detection model studied at length in psychology, 
with the intercept being the "criterion" in signal detection language 
(denoted c), and the slope being "sensitivity" (denoted d' or 
d-prime). It should definitely be possible to compute precision and 
recall from c and d'.
I am not familiar with this approach in psychology. As I say above, I am not sure this addresses the problem I want to address but do you have some pointer to the literature where I can read more about the approach?

Best,

R.

This might be simpler with a logit rather than probit link function.

Let me know if I have misunderstood what you are trying to accomplish

Jake

From: rdiaz02 at gmail.com.> To:
r-sig-mixed-models at r-project.org.> Date: Tue, 11 Mar 2014 11:48:57
+0100.> CC: ramon.diaz at iib.uam.es.> Subject: [R-sig-ME] Modeling
precision and recall with GLMMs.> .

 Dear All,. .

 I am examining the performance of a couple of classification-like  
methods. under different scenarios. Two of the metrics I am using are  
precision and. recall (TP/(TP + FP) and TP/(TP + FN), where TP, FP, 
and  FN are "true. positives", "false positives", and "false 
negatives" in a  simple two-way. confusion matrix). Some of the 
combinations of methods  have been used on. exactly the same data 
sets. So it is easy to set up a  binomial model (or. multinomial2 if using MCMCglmm) such as.

cbind(TP, FP) ~ fixed effects + (1|dataset)

However, the left hand side sounds questionable, specially with 
precision:. the expression TP/(TP + FP) has, in the denominator, a 
(TP +
FP) [the. number of results returned, or retrieved instances, etc] 
that, itself, can. be highly method-dependent (i.e., affected by the 
fixed effects). So rather. than a true proportion, this seems more 
like a ratio, where each of TP and. FP have their own variance, a 
covariance, etc, and thus the error. distribution is a mess (not the 
tidy thing of a binomial).

I've looked around in the literature and have not found much (maybe 
the. problem are my searching skills :-). Most people use rankings of 
methods,. not directly modeling precision or recall in the left-hand 
side of a. (generalized) linear model. A couple of papers use a 
linear model on the. log-transformed response (which I think is even 
worse than the above. binomial model, specially with lots of 0s or 
1s). Some other people use a. single measure, such as the F-measure 
or Matthews correlation coefficient,. and I am using something 
similar too, but I specifically wanted to also. model precision and recall.. . .

An option would be a multi-response model with MCMCglmm, but I am not 
sure if this is appropriate either (dependence of the sum of FP and 
TP on the. fixed effects).. . .

Best,

R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut.noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz

Modeling precision and recall with GLMMs

Thread (8 messages)