Metrics of Binary Classifier Performance: Predictive vs. Retrodictive

I point out that the $\TPR$, the $\FNR$, the $\TNR$ and the $\FPR$, which are commonly used metrics to gauge the performance of binary classifiers, are retrodictive. I then contrast them with their predictive counterparts and name these according to a coherent, self-explanatory scheme. I also provide equations to convert from one set of rates to the other, and interactive visuals that enable exploring this interdependence.

Interactive Visuals

Without further ado, here are links to interactive visuals. Even though I knew that seemingly good retrodictive metrics can conceal unimpressive predictive metrics, I was surprised at some of what I saw, to the point that I still fear that mistakes might be lurking within my calculations.
• From retrodictive to predictive
• From predictive to retrodictive
For definitions and calculations, please carry on perusing.

Retrodictive Metrics of Binary Classifier Performance

For a binary classifier, the most self-explanatory metrics of performance that might be
• the True Positive Rate ($\TPR$) and its complement, the False Negative Rate ($\FNR$); and
• the True Negative Rate ($\TNR$) and its complement, the False Positive Rate ($\FPR$).
They are defined as follows.
 $\TPR$ : Proportion of objects predicted to be in the positive class among objects observed to be in the positive class $\FNR$ : Proportion of objects predicted to be in the negative class among objects observed to be in the positive class $\TNR$ : Proportion of objects predicted to be in the negative class among objects observed to be in the negative class $\FPR$ : Proportion of objects predicted to be in the positive class among objects observed to be in the negative class

And they also go by other names:
 $\TPR$ : sensitivity, recall, hit rate $\FNR$ : miss rate $\TNR$ : specificity, selectivity $\FPR$ : fall-out, false alarm rate

These rates actually are conditional probabilities that a prediction is correct or incorrect given an observation. As such, they provide a retrodictive assessment of the classifier's performance. For explicitness, I denote prediction and observation events as follows.
 $A$ : An object is predicted to be in the positive class. $B$ : An object is observed to be in the positive class.

Using common notations of probability theory, the following holds. \begin{align*} \TPR &= p(A|B) \\ \FNR &= p(\bar{A}|B) \\ \TNR &= p(\bar{A}|\bar{B}) \\ \FPR &= p(A|\bar{B}) \end{align*} The following complement equations hold. \begin{equation*} \TPR + \FNR = 1 \qquad \TNR + \FPR = 1 \end{equation*}

Predictive Metrics of Binary Classifier Performance

I think that the reliability of a binary classifier is better evaluated by considering the conditional probabilities that observations match or mismatch given predictions. More explicitly, I think the following conditional probabilities are more pertinent.
 $p(B|A)$ : Proportion of objects observed to be in the positive class among objects predicted to be in the positive class $p(\bar{B}|A)$ : Proportion of objects observed to be in the negative class among objects predicted to be in the positive class $p(\bar{B}|\bar{A})$ : Proportion of objects observed to be in the negative class among objects predicted to be in the negative class $p(B|\bar{A})$ : Proportion of objects observed to be in the positive class among objects predicted to be in the negative class

I think of these probabilities as predictive: predictions are given, and the assessment is whether or not they match observations. For this reason, I find it convenient to name them as follows.
 $p(B|A)$ $=$ $\pTPR$ : Predictive True Positive Rate $p(\bar{B}|A)$ $=$ $\pFPR$ : Predictive False Positive Rate $p(\bar{B}|\bar{A})$ $=$ $\pTNR$ : Predictive True Negative Rate $p(B|\bar{A})$ $=$ $\pFNR$ : Predictive False Negative Rate

The following complement equations hold. \begin{equation*} \pTPR + \pFPR = 1 \qquad \pTNR + \pFNR = 1 \end{equation*} These predictive rates already have names:
 $\pTPR$ : Positive Predictive Value ($\PPV$); Precision $\pFPR$ : False Discovery Rate ($\FDR$) $\pTNR$ : Negative Predictive Value ($\NPV$) $\pFNR$ : False Omission Rate ($\FOR$)

The naming scheme herein is I think more coherent and self-explanatory. The $\PPV$-$\NPV$ naming scheme is adequate but incomplete.

Setting up for Equations

The following notations will make it convenient to write down equations for conversion between the two sets of rates; and help reveal a pattern among the equations.

I denote $\alpha$ and $\beta$ are the odds of the predicted negative class and the observed negative class, respectively. \begin{equation*} \alpha = \dfrac{ 1 - p(A) }{ p(A) } \qquad \beta = \dfrac{ 1 - p(B) }{ p(B) } \end{equation*} I denote $f$ and $g$ the bivariate and trivariate functions defined as follows. \begin{equation*} f(t,r) = \dfrac{ 1 }{ 1 + t r } \qquad g(t,u,v) = \dfrac{ 1 }{ 1 + t v/u } = f(t,v/u) \end{equation*}

Equations: From Retrodictive to Predictive

\begin{align} \label{eq:pTPR} \pTPR &= \dfrac{ 1 }{ 1 + \beta \times \dfrac{\FPR}{\TPR} } = g( \beta , \TPR , \FPR ) \\[2mm] \label{eq:pFNR} \pFNR &= \dfrac{ 1 }{ 1 + \beta \times \dfrac{\TNR}{\FNR} } = g( \beta , \FNR , \TNR ) \\[2mm] \label{eq:pFPR} \pFPR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\beta} \times \dfrac{\TPR}{\FPR} } = g( 1/\beta , \FPR , \TPR ) \\[2mm] \label{eq:pTNR} \pTNR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\beta} \times \dfrac{\FNR}{\TNR} } = g( 1/\beta , \TNR , \FNR ) \end{align}

Equations: From Predictive to Retrodictive

\begin{align} \label{eq:TPR} \TPR &= \dfrac{ 1 }{ 1 + \alpha \times \dfrac{\pFNR}{\pTPR} } = g( \alpha , \pTPR , \pFNR ) \\[2mm] \label{eq:FPR} \FPR &= \dfrac{ 1 }{ 1 + \alpha \times \dfrac{\pTNR}{\pFPR} } = g( \alpha , \pFPR , \pTNR ) \\[2mm] \label{eq:TNR} \TNR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\alpha} \times \dfrac{\pFPR}{\pTNR} } = g( 1/\alpha , \pTNR , \pFPR ) \\[2mm] \label{eq:FNR} \FNR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\alpha} \times \dfrac{\pTPR}{\pFNR} } = g( 1/\alpha , \pFNR , \pTPR ) \end{align}

Consolidating Conversion Equations: $\TPR$ and $\FPR$ vs. $\pTPR$ and $\pFNR$

Let $h = (h_{1},h_{2})$ be the function given as follows. \begin{equation*} h_{1} = g \qquad\qquad h_{2}(t,u,v) = g(t,1-u,1-v) \end{equation*} Equations \eqref{eq:pTPR} and \eqref{eq:pFNR} consolidate into equation \eqref{eq:TPR-FPR-to-pTPR-pFNR}.
Equations \eqref{eq:TPR} and \eqref{eq:FPR} consolidate into equation \eqref{eq:pTPR-pFNR-to-TPR-FPR}. \begin{align} \label{eq:TPR-FPR-to-pTPR-pFNR} ( \pTPR , \pFNR ) &= h( \beta , \TPR , \FPR ) \\[2mm] \label{eq:pTPR-pFNR-to-TPR-FPR} ( \TPR , \FPR ) &= h( \alpha , \pTPR , \pFNR ) \end{align} Equations \eqref{eq:TPR-FPR-to-pTPR-pFNR} and \eqref{eq:pTPR-pFNR-to-TPR-FPR} further consolidate into the following diagram.

Consolidating Conversion Equations: $\TPR$ and $\TNR$ vs. $\PPV$ and $\NPV$

Let $k = (k_{1},k_{2})$ be the function given as follows. \begin{equation*} k_{1}(t,u,v) = g(t,u,1-v) \qquad\qquad k_{2}(t,u,v) = g(1/t,v,1-u) \end{equation*} Equations \eqref{eq:pTPR} and \eqref{eq:pTNR} consolidate into equation \eqref{eq:TPR-TNR-to-PPV-NPV}.
Equations \eqref{eq:TPR} and \eqref{eq:TNR} consolidate into equation \eqref{eq:PPV-NPV-to-TPR-TNR}. \begin{align} \label{eq:TPR-TNR-to-PPV-NPV} ( \PPV , \NPV ) &= k( \beta , \TPR , \TNR ) \\[2mm] \label{eq:PPV-NPV-to-TPR-TNR} ( \TPR , \TNR ) &= k( \alpha , \PPV , \NPV ) \end{align} Equations \eqref{eq:TPR-TNR-to-PPV-NPV} and \eqref{eq:PPV-NPV-to-TPR-TNR} further consolidate into the following diagram.

Epilogue

A binary classifier is typically realized by thresholding a regression-learned continuous function that produces a prediction score: an object is predicted to belong to the positive class if its prediction score is greater than the threshold. Under these circumstances, a $\TPR$-$\FPR$ (receiver operator characteristic, or ROC) curve can be produced by varying the threshold over the range of eligible values. It is worth noting that the $\TPR$-$\FPR$ ROC curve and the commonly derived area under the curve (AUC ROC) are attributes of the regressor, not of any particular thresholding-derived classifier. In particular, the area under the $\TPR$-$\FPR$ curve is the proportion of pairs of objects whose prediction scores are ordered in the same direction that the observations are. I am thinking that there also might be a similarly or more informative parameterized curve of predictive rates. If successful, this reflection should at least mean the obsoleting of debates on which of the $\TPR$-$\FPR$ and the precision-recall curves is more pertinent depending of class balance. Something I might write about next, time permitting.