# Phi Coefficient

Wiki Index

From Wikipedia:

The phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables

Given two *binary* variables, their interplay can be tabulated (for example, the top-left cell is some quantity when \(x\) and \(y\) are both true):

x=1 | x=0 | total | |
---|---|---|---|

y=1 | \(n_{11}\) | \(n_{10}\) | \(n_{1\bullet}\) |

y=0 | \(n_{01}\) | \(n_{00}\) | \(n_{0\bullet}\) |

total | \(n_{\bullet1}\) | \(n_{\bullet0}\) | \(n\) |

Now you calculate the phi coefficient like so:

$$
\phi = \frac{n_{11}n_{00}-n_{10}n_{01}}{\sqrt{n_{1\bullet}n_{0\bullet}n_{\bullet0}n_{\bullet1}}}
$$

or:

$$
\phi = \frac{nn_{11}-n_{1\bullet}n_{\bullet1}}{\sqrt{n_{1\bullet}n_{\bullet1}(n-n_{1\bullet})(n-n_{\bullet1})}}
$$

The result is a number in \([-1, 1]\) that indicates the degree to which \(x\) and \(y\) are associated/correlated. \(1\) indicates that the variables are identical, and \(0\) indicates that they’re effectively independent.

An example:

- \(x\): Event definition A
- \(y\): Event definition B
- \(n_{11}\): Number of events matching both event definitions