# Metrics of Binary Classifier Performance: Predictive vs. Retrodictive

I point out that the $\TPR$, the $\FNR$, the $\TNR$ and the $\FPR$, which are commonly used metrics to gauge the performance of binary classifiers, are retrodictive. I then contrast them with their predictive counterparts and name these according to a coherent, self-explanatory scheme. I also provide equations to convert from one set of rates to the other, and interactive visuals that enable exploring this interdependence.

## Interactive Visuals

Without further ado, here are links to interactive visuals. Even though I knew that seemingly good retrodictive metrics can conceal unimpressive predictive metrics, I was surprised at some of what I saw, to the point that I still fear that mistakes might be lurking within my calculations.
• From retrodictive to predictive
• From predictive to retrodictive
For definitions and calculations, please carry on perusing.

## Retrodictive Metrics of Binary Classifier Performance

For a binary classifier, the most self-explanatory metrics of performance that might be
• the True Positive Rate ($\TPR$) and its complement, the False Negative Rate ($\FNR$); and
• the True Negative Rate ($\TNR$) and its complement, the False Positive Rate ($\FPR$).
They are defined as follows.
 $\TPR$ : Proportion of objects predicted to be in the positive class among objects observed to be in the positive class $\FNR$ : Proportion of objects predicted to be in the negative class among objects observed to be in the positive class $\TNR$ : Proportion of objects predicted to be in the negative class among objects observed to be in the negative class $\FPR$ : Proportion of objects predicted to be in the positive class among objects observed to be in the negative class

And they also go by other names:
 $\TPR$ : sensitivity, recall, hit rate $\FNR$ : miss rate $\TNR$ : specificity, selectivity $\FPR$ : fall-out, false alarm rate

These rates actually are conditional probabilities that a prediction is correct or incorrect given an observation. As such, they provide a retrodictive assessment of the classifier's performance. For explicitness, I denote prediction and observation events as follows.
 $A$ : An object is predicted to be in the positive class. $B$ : An object is observed to be in the positive class.

Using common notations of probability theory, the following holds. \begin{align*} \TPR &= p(A|B) \\ \FNR &= p(\bar{A}|B) \\ \TNR &= p(\bar{A}|\bar{B}) \\ \FPR &= p(A|\bar{B}) \end{align*} The following complement equations hold. \begin{equation*} \TPR + \FNR = 1 \qquad \TNR + \FPR = 1 \end{equation*}

## Predictive Metrics of Binary Classifier Performance

I think that the reliability of a binary classifier is better evaluated by considering the conditional probabilities that observations match or mismatch given predictions. More explicitly, I think the following conditional probabilities are more pertinent.
 $p(B|A)$ : Proportion of objects observed to be in the positive class among objects predicted to be in the positive class $p(\bar{B}|A)$ : Proportion of objects observed to be in the negative class among objects predicted to be in the positive class $p(\bar{B}|\bar{A})$ : Proportion of objects observed to be in the negative class among objects predicted to be in the negative class $p(B|\bar{A})$ : Proportion of objects observed to be in the positive class among objects predicted to be in the negative class

I think of these probabilities as predictive: predictions are given, and the assessment is whether or not they match observations. For this reason, I find it convenient to name them as follows.
 $p(B|A)$ $=$ $\pTPR$ : Predictive True Positive Rate $p(\bar{B}|A)$ $=$ $\pFPR$ : Predictive False Positive Rate $p(\bar{B}|\bar{A})$ $=$ $\pTNR$ : Predictive True Negative Rate $p(B|\bar{A})$ $=$ $\pFNR$ : Predictive False Negative Rate

The following complement equations hold. \begin{equation*} \pTPR + \pFPR = 1 \qquad \pTNR + \pFNR = 1 \end{equation*} These predictive rates already have names:
 $\pTPR$ : Positive Predictive Value ($\PPV$); Precision $\pFPR$ : False Discovery Rate ($\FDR$) $\pTNR$ : Negative Predictive Value ($\NPV$) $\pFNR$ : False Omission Rate ($\FOR$)

The naming scheme herein is I think more coherent and self-explanatory. The $\PPV$-$\NPV$ naming scheme is adequate but incomplete.

## Setting up for Equations

The following notations will make it convenient to write down equations for conversion between the two sets of rates; and help reveal a pattern among the equations.

I denote $\alpha$ and $\beta$ are the odds of the predicted negative class and the observed negative class, respectively. \begin{equation*} \alpha = \dfrac{ 1 - p(A) }{ p(A) } \qquad \beta = \dfrac{ 1 - p(B) }{ p(B) } \end{equation*} I denote $f$ and $g$ the bivariate and trivariate functions defined as follows. \begin{equation*} f(t,r) = \dfrac{ 1 }{ 1 + t r } \qquad g(t,u,v) = \dfrac{ 1 }{ 1 + t v/u } = f(t,v/u) \end{equation*}

## Equations: From Retrodictive to Predictive

\begin{align} \label{eq:pTPR} \pTPR &= \dfrac{ 1 }{ 1 + \beta \times \dfrac{\FPR}{\TPR} } = g( \beta , \TPR , \FPR ) \\[2mm] \label{eq:pFNR} \pFNR &= \dfrac{ 1 }{ 1 + \beta \times \dfrac{\TNR}{\FNR} } = g( \beta , \FNR , \TNR ) \\[2mm] \label{eq:pFPR} \pFPR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\beta} \times \dfrac{\TPR}{\FPR} } = g( 1/\beta , \FPR , \TPR ) \\[2mm] \label{eq:pTNR} \pTNR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\beta} \times \dfrac{\FNR}{\TNR} } = g( 1/\beta , \TNR , \FNR ) \end{align}

## Equations: From Predictive to Retrodictive

\begin{align} \label{eq:TPR} \TPR &= \dfrac{ 1 }{ 1 + \alpha \times \dfrac{\pFNR}{\pTPR} } = g( \alpha , \pTPR , \pFNR ) \\[2mm] \label{eq:FPR} \FPR &= \dfrac{ 1 }{ 1 + \alpha \times \dfrac{\pTNR}{\pFPR} } = g( \alpha , \pFPR , \pTNR ) \\[2mm] \label{eq:TNR} \TNR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\alpha} \times \dfrac{\pFPR}{\pTNR} } = g( 1/\alpha , \pTNR , \pFPR ) \\[2mm] \label{eq:FNR} \FNR &= \dfrac{ 1 }{ 1 + \dfrac{1}{\alpha} \times \dfrac{\pTPR}{\pFNR} } = g( 1/\alpha , \pFNR , \pTPR ) \end{align}

## Consolidating Conversion Equations: $\TPR$ and $\FPR$ vs. $\pTPR$ and $\pFNR$

Let $h = (h_{1},h_{2})$ be the function given as follows. \begin{equation*} h_{1} = g \qquad\qquad h_{2}(t,u,v) = g(t,1-u,1-v) \end{equation*} Equations \eqref{eq:pTPR} and \eqref{eq:pFNR} consolidate into equation \eqref{eq:TPR-FPR-to-pTPR-pFNR}.
Equations \eqref{eq:TPR} and \eqref{eq:FPR} consolidate into equation \eqref{eq:pTPR-pFNR-to-TPR-FPR}. \begin{align} \label{eq:TPR-FPR-to-pTPR-pFNR} ( \pTPR , \pFNR ) &= h( \beta , \TPR , \FPR ) \\[2mm] \label{eq:pTPR-pFNR-to-TPR-FPR} ( \TPR , \FPR ) &= h( \alpha , \pTPR , \pFNR ) \end{align} Equations \eqref{eq:TPR-FPR-to-pTPR-pFNR} and \eqref{eq:pTPR-pFNR-to-TPR-FPR} further consolidate into the following diagram.

## Consolidating Conversion Equations: $\TPR$ and $\TNR$ vs. $\PPV$ and $\NPV$

Let $k = (k_{1},k_{2})$ be the function given as follows. \begin{equation*} k_{1}(t,u,v) = g(t,u,1-v) \qquad\qquad k_{2}(t,u,v) = g(1/t,v,1-u) \end{equation*} Equations \eqref{eq:pTPR} and \eqref{eq:pTNR} consolidate into equation \eqref{eq:TPR-TNR-to-PPV-NPV}.
Equations \eqref{eq:TPR} and \eqref{eq:TNR} consolidate into equation \eqref{eq:PPV-NPV-to-TPR-TNR}. \begin{align} \label{eq:TPR-TNR-to-PPV-NPV} ( \PPV , \NPV ) &= k( \beta , \TPR , \TNR ) \\[2mm] \label{eq:PPV-NPV-to-TPR-TNR} ( \TPR , \TNR ) &= k( \alpha , \PPV , \NPV ) \end{align} Equations \eqref{eq:TPR-TNR-to-PPV-NPV} and \eqref{eq:PPV-NPV-to-TPR-TNR} further consolidate into the following diagram.

## Epilogue

A binary classifier is typically realized by thresholding a regression-learned continuous function that produces a prediction score: an object is predicted to belong to the positive class if its prediction score is greater than the threshold. Under these circumstances, a $\TPR$-$\FPR$ (receiver operator characteristic, or ROC) curve can be produced by varying the threshold over the range of eligible values. It is worth noting that the $\TPR$-$\FPR$ ROC curve and the commonly derived area under the curve (AUC ROC) are attributes of the regressor, not of any particular thresholding-derived classifier. In particular, the area under the $\TPR$-$\FPR$ curve is the proportion of pairs of objects whose prediction scores are ordered in the same direction that the observations are. I am thinking that there also might be a similarly or more informative parameterized curve of predictive rates. If successful, this reflection should at least mean the obsoleting of debates on which of the $\TPR$-$\FPR$ and the precision-recall curves is more pertinent depending of class balance. Something I might write about next, time permitting.