#StackBounty: #statistical-significance #t-test #proportion #pivot-table How to test for difference between table of weighted proportions

Bounty: 50

I’m wondering how one would go about determining whether there was a significant difference between the column/row values of a table of proportions.

For example, given the following table:

A     0     1     2
B                  
0  35.3  27.2  43.2
1  18.0  22.9  19.5
2  26.4  23.1  15.6
3  20.3  26.8  21.7

cell row 1, col 1 contains value 22.9 (percentage), how would I determine whether this percentage is significantly different to columns 0,1 (with values 18.0, 19.5).

I’m assuming that it’s some sort of t-test, but I can’t seem to find something that covers this particular case.

I would also be interested in how to compare values between columns. It seems that the question is comparing proportions within groups and between groups?

Edit

I would like to be able to determine which columns are significantly different, not just whether there is a significant difference. So, for row 1 col 1 the result might be col 0 is significantly different but col 2 is not.

Edit 2

If there’s anything that is unclear about this question please let me know.

The expected output would be something along the lines of:

A     0     1     2
B                  
0  35.3  27.2  43.2
    2     2     0,1

1  18.0  22.9  19.5
           0

2  26.4  23.1  15.6
                0,1
                
3  20.3  26.8  21.7
    1    0,2      1

I’ve just made the above up – but the above is to indicate that there would be,
for each element in a row, a test between that element and all of the others.

It shows that the cell row 1, col 2 is significantly different from and row 2, col 1

Data

Not strictly necessary to the question – just putting the (sloppy) code that generated the above table in case it’s of use to anyone in future.

import numpy as np
import pandas as pd

np.random.seed(3)

N = 500
dt_1 = pd.DataFrame({
    'A' : np.random.choice(range(3), size = N, p = [0.3, 0.3, 0.4]),
    'B' : np.random.choice(range(4), size = N, p = [0.25, .25, .25, .25]),
    'W' : np.abs(np.random.normal(loc = 1, scale = 10, size = N))
    
})

dt_2 = pd.DataFrame({
    'A' : np.random.choice(range(3), size = N, p = [0.1, 0.1, 0.8]),
    'B' : np.random.choice(range(4), size = N, p = [0.5, .2, .1, .2]),
    'W' : np.abs(np.random.normal(loc = 1, scale = 10, size = N))
    
})

dt = pd.concat([dt_1, dt_2], axis = 0)

dt['W'] = dt['W'].div(dt['W'].sum()).mul(len(dt))

crosstab = dt.groupby("A").apply(lambda g: 
                      g.groupby("B").apply(lambda sg:
                                           round(100 * (sg['W'].sum() / g['W'].sum()), 1)
                                          )
                     ).reset_index(drop=True)

crosstab = crosstab.T
crosstab.columns.name = "A"
```


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.