# #StackBounty: #confidence-interval #python #standard-error #cohens-kappa Calculating the Standard Error and Confidence Interval for Coh…

### Bounty: 50

I need to evaluate the performance of a machine learning application. One of the evaluation metrics chosen is Cohen’s Quadratic Kappa. I found this Python tutorial on how to calculate Cohen’s Quadratic Kappa. What is missing, however, is how to calculate the confidence interval.

Let’s walk through my example (I use a smaller data set for the sake of simplicity). I use NumPy and Scipy Stats for this purpose:

``````from math import sqrt
import numpy as np
from scipy.stats import norm
``````

This is my confusion matrix:

``````# x: actuals, y: predictions
confusion_matrix = np.array([
[9, 5, 2, 0, 0, 0],
[4, 7, 1, 0, 0, 0],
[1, 2, 4, 0, 1, 0],
[0, 1, 1, 5, 1, 0],
[0, 0, 0, 1, 2, 1],
[0, 0, 0, 0, 0, 1],
], dtype=np.int)
rows = confusion_matrix.shape
cols = confusion_matrix.shape
``````

I calculate a weight matrix and histograms:

``````weights = np.zeros((rows, cols))
for r in range(rows):
for c in range(cols):
weights[r, c] = float(((r-c)**2)/(rows*cols))
hist_actual = np.sum(confusion_matrix, axis=0)
hist_prediction = np.sum(confusion_matrix, axis=1)
``````

The expected prediction quality by mere chance is calculated as follows:

``````expected = np.outer(hist_actual, hist_prediction)
``````

This matrix, and the actual confusion matrix, are normalized:

``````expected_norm = expected / expected.sum()
confusion_matrix_norm = confusion_matrix / confusion_matrix.sum()
``````

Now I calculate the numerator (actual observed agreement) and the denominator (expected agreement by chance):

``````for r in range(rows):
for c in range(cols):
numerator += weights[r, c] * confusion_matrix_norm[r, c]
denominator += weights[r, c] * expected_norm[r, c]
``````

Cohen’s Kappa can now be calculated as:

``````weighted_kappa = (1 - (numerator/denominator))
``````

Which gives me a result of 0.817.

Now to my question: I need to calculate the standard error, in order to calculate the confidence interval. Here’s my approach:

``````#            p(1-p)
# sek = sqrt -------
#            n(1-e)²
#
# p: numerator (actual observed agreement)
# e: denominator (expected agreement by chance)
# n: total number of predictions
total = hist_actual.sum()
sek = sqrt((numerator * (1 - numerator)) / (total * (1 - denominator) ** 2))
``````

Can I use the total number of predictions, even though I calculate with a normalized numerator and denominator? This would result in a standard error of kappa of 0.023.

The 95% confidence interval then is just straightforward:

``````alpha = 0.95
margin = (1 - alpha) / 2  # two-tailed test
x = norm.ppf(1 - margin)
lower = weighted_kappa - x * sek
upper = weighted_kappa + x * sek
``````

Which gives an interval of [0.772;0.861].

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.