Modeling precision and recall with GLMMs
The "how good or bad" each method is, is what will come out of the method Jake is suggesting. Using multilevel models for these is common in the memory recognition literature in psychology for the last decade or so, but is also relevant in lots of other areas like medical diagnostics. If the variable IS_ij is whether person i saw stimulus j (0 not seen, 1 seen), and SAY_ij is whether the person says she saw the stimuli, then a multilevel probit or logit regression, with careful coding of the variables, can mimic the standard SDT models. The critical variable for saying if people are accurate is the coefficient in front of SAY. If you have different conditions, COND_j, then interactions between COND_j (or COND_ij if varied within subject) and SAY_ij examine if accuracy varies among these. An important plus of the multilevel models is the coefficients can vary by person and/or stimuli.
Hi Ramon,
I'm not sure that I fully understand the details of what you want to accomplish. But I do want to ask: you jump right into your email assuming that of course you want to model precision and recall, but what about modelling the data directly (i.e., individual classification decisions) rather than summaries of the data? Then you could work backward (forward?) from the model results to compute what the implied precision and recall would be
Sorry I did not provide enough details. I am comparing some methods for reconstructing networks, and the True positives and False positives, for instance, refer to the number of correctly inferred edges and to the number of edges that a procedure recovers that are not in the original network, respectively. So the network reconstruction methods model the data directly, and what I want to model is how good or bad are what they return as a function of several other variables (related to several dimensions of the toughness of the problem, etc)
If you decided that modelling the data directly would work for your
purposes, then one way of doing this would be to regress
classification decisions ("P" or "N") on actual classifications ("P" or "N").
I am not sure that would work. For each data set, each method returns a bunch of "P"s and "N"s. But what I want to do is model not the relationship between truth and prediction, but rather how good or bad each method is (at trying to reconstruct the truth).
If this is done in a probit model, it is equivalent to the equal-variance signal detection model studied at length in psychology, with the intercept being the "criterion" in signal detection language (denoted c), and the slope being "sensitivity" (denoted d' or d-prime). It should definitely be possible to compute precision and recall from c and d'.
I am not familiar with this approach in psychology. As I say above, I am not sure this addresses the problem I want to address but do you have some pointer to the literature where I can read more about the approach? Best, R.
This might be simpler with a logit rather than probit link function. Let me know if I have misunderstood what you are trying to accomplish
Jake
From: rdiaz02 at gmail.com.> To: r-sig-mixed-models at r-project.org.> Date: Tue, 11 Mar 2014 11:48:57 +0100.> CC: ramon.diaz at iib.uam.es.> Subject: [R-sig-ME] Modeling precision and recall with GLMMs.> .
Dear All,. .
I am examining the performance of a couple of classification-like methods. under different scenarios. Two of the metrics I am using are precision and. recall (TP/(TP + FP) and TP/(TP + FN), where TP, FP, and FN are "true. positives", "false positives", and "false negatives" in a simple two-way. confusion matrix). Some of the combinations of methods have been used on. exactly the same data sets. So it is easy to set up a binomial model (or. multinomial2 if using MCMCglmm) such as.
cbind(TP, FP) ~ fixed effects + (1|dataset)
However, the left hand side sounds questionable, specially with precision:. the expression TP/(TP + FP) has, in the denominator, a (TP + FP) [the. number of results returned, or retrieved instances, etc] that, itself, can. be highly method-dependent (i.e., affected by the fixed effects). So rather. than a true proportion, this seems more like a ratio, where each of TP and. FP have their own variance, a covariance, etc, and thus the error. distribution is a mess (not the tidy thing of a binomial).
I've looked around in the literature and have not found much (maybe the. problem are my searching skills :-). Most people use rankings of methods,. not directly modeling precision or recall in the left-hand side of a. (generalized) linear model. A couple of papers use a linear model on the. log-transformed response (which I think is even worse than the above. binomial model, specially with lots of 0s or 1s). Some other people use a. single measure, such as the F-measure or Matthews correlation coefficient,. and I am using something similar too, but I specifically wanted to also. model precision and recall.. . .
An option would be a multi-response model with MCMCglmm, but I am not sure if this is appropriate either (dependence of the sum of FP and TP on the. fixed effects).. . .
Best,
R-sig-mixed-models at r-project.org mailing list
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models