#StackBounty: #confidence-interval #python #standard-error #cohens-kappa Calculating the Standard Error and Confidence Interval for Coh…

Bounty: 50

I need to evaluate the performance of a machine learning application. One of the evaluation metrics chosen is Cohen’s Quadratic Kappa. I found this Python tutorial on how to calculate Cohen’s Quadratic Kappa. What is missing, however, is how to calculate the confidence interval.

Let’s walk through my example (I use a smaller data set for the sake of simplicity). I use NumPy and Scipy Stats for this purpose:

from math import sqrt
import numpy as np
from scipy.stats import norm

This is my confusion matrix:

# x: actuals, y: predictions
confusion_matrix = np.array([
    [9, 5, 2, 0, 0, 0],
    [4, 7, 1, 0, 0, 0],
    [1, 2, 4, 0, 1, 0],
    [0, 1, 1, 5, 1, 0],
    [0, 0, 0, 1, 2, 1],
    [0, 0, 0, 0, 0, 1],
], dtype=np.int)
rows = confusion_matrix.shape[0]
cols = confusion_matrix.shape[1]

I calculate a weight matrix and histograms:

weights = np.zeros((rows, cols))
for r in range(rows):
    for c in range(cols):
        weights[r, c] = float(((r-c)**2)/(rows*cols))
hist_actual = np.sum(confusion_matrix, axis=0)
hist_prediction = np.sum(confusion_matrix, axis=1)

The expected prediction quality by mere chance is calculated as follows:

expected = np.outer(hist_actual, hist_prediction)

This matrix, and the actual confusion matrix, are normalized:

expected_norm = expected / expected.sum()
confusion_matrix_norm = confusion_matrix / confusion_matrix.sum()

Now I calculate the numerator (actual observed agreement) and the denominator (expected agreement by chance):

for r in range(rows):
    for c in range(cols):
        numerator += weights[r, c] * confusion_matrix_norm[r, c]
        denominator += weights[r, c] * expected_norm[r, c]

Cohen’s Kappa can now be calculated as:

weighted_kappa = (1 - (numerator/denominator))

Which gives me a result of 0.817.

Now to my question: I need to calculate the standard error, in order to calculate the confidence interval. Here’s my approach:

#            p(1-p)
# sek = sqrt -------
#            n(1-e)²
#
# p: numerator (actual observed agreement)
# e: denominator (expected agreement by chance)
# n: total number of predictions
total = hist_actual.sum()
sek = sqrt((numerator * (1 - numerator)) / (total * (1 - denominator) ** 2))

Can I use the total number of predictions, even though I calculate with a normalized numerator and denominator? This would result in a standard error of kappa of 0.023.

The 95% confidence interval then is just straightforward:

alpha = 0.95
margin = (1 - alpha) / 2  # two-tailed test
x = norm.ppf(1 - margin)
lower = weighted_kappa - x * sek
upper = weighted_kappa + x * sek

Which gives an interval of [0.772;0.861].


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.