# #StackBounty: #statistical-significance #t-test #proportion #pivot-table How to test for difference between table of weighted proportions

### Bounty: 50

I’m wondering how one would go about determining whether there was a significant difference between the column/row values of a table of proportions.

For example, given the following table:

A     0     1     2
B
0  35.3  27.2  43.2
1  18.0  22.9  19.5
2  26.4  23.1  15.6
3  20.3  26.8  21.7

cell row 1, col 1 contains value 22.9 (percentage), how would I determine whether this percentage is significantly different to columns 0,1 (with values 18.0, 19.5).

I’m assuming that it’s some sort of t-test, but I can’t seem to find something that covers this particular case.

I would also be interested in how to compare values between columns. It seems that the question is comparing proportions within groups and between groups?

# Edit

I would like to be able to determine which columns are significantly different, not just whether there is a significant difference. So, for row 1 col 1 the result might be col 0 is significantly different but col 2 is not.

## Edit 2

The expected output would be something along the lines of:

A     0     1     2
B
0  35.3  27.2  43.2
2     2     0,1

1  18.0  22.9  19.5
0

2  26.4  23.1  15.6
0,1

3  20.3  26.8  21.7
1    0,2      1

I’ve just made the above up – but the above is to indicate that there would be,
for each element in a row, a test between that element and all of the others.

It shows that the cell row 1, col 2 is significantly different from and row 2, col 1

# Data

Not strictly necessary to the question – just putting the (sloppy) code that generated the above table in case it’s of use to anyone in future.

import numpy as np
import pandas as pd

np.random.seed(3)

N = 500
dt_1 = pd.DataFrame({
'A' : np.random.choice(range(3), size = N, p = [0.3, 0.3, 0.4]),
'B' : np.random.choice(range(4), size = N, p = [0.25, .25, .25, .25]),
'W' : np.abs(np.random.normal(loc = 1, scale = 10, size = N))

})

dt_2 = pd.DataFrame({
'A' : np.random.choice(range(3), size = N, p = [0.1, 0.1, 0.8]),
'B' : np.random.choice(range(4), size = N, p = [0.5, .2, .1, .2]),
'W' : np.abs(np.random.normal(loc = 1, scale = 10, size = N))

})

dt = pd.concat([dt_1, dt_2], axis = 0)

dt['W'] = dt['W'].div(dt['W'].sum()).mul(len(dt))

crosstab = dt.groupby("A").apply(lambda g:
g.groupby("B").apply(lambda sg:
round(100 * (sg['W'].sum() / g['W'].sum()), 1)
)
).reset_index(drop=True)

crosstab = crosstab.T
crosstab.columns.name = "A"
$$```$$

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.