*Bounty: 50*

*Bounty: 50*

I’m wondering how one would go about determining whether there was a significant difference between the column/row values of a table of proportions.

For example, given the following table:

```
A 0 1 2
B
0 35.3 27.2 43.2
1 18.0 22.9 19.5
2 26.4 23.1 15.6
3 20.3 26.8 21.7
```

cell `row 1, col 1`

contains value `22.9`

(percentage), how would I determine whether this percentage is significantly different to columns `0,1`

(with values `18.0, 19.5`

).

I’m assuming that it’s some sort of t-test, but I can’t seem to find something that covers this particular case.

I would also be interested in how to compare values between columns. It seems that the question is comparing proportions within groups and between groups?

# Edit

I would like to be able to determine which columns are significantly different, not just whether there *is* a significant difference. So, for `row 1 col 1`

the result might be `col 0`

is significantly different but `col 2`

is not.

## Edit 2

If there’s anything that is unclear about this question please let me know.

The expected output would be something along the lines of:

```
A 0 1 2
B
0 35.3 27.2 43.2
2 2 0,1
1 18.0 22.9 19.5
0
2 26.4 23.1 15.6
0,1
3 20.3 26.8 21.7
1 0,2 1
```

I’ve just made the above up – but the above is to indicate that there would be,

for each element in a row, a test between that element and all of the others.

It shows that the cell `row 1`

, `col 2`

is significantly different from and `row 2, col 1`

# Data

Not strictly necessary to the question – just putting the (sloppy) code that generated the above table in case it’s of use to anyone in future.

```
import numpy as np
import pandas as pd
np.random.seed(3)
N = 500
dt_1 = pd.DataFrame({
'A' : np.random.choice(range(3), size = N, p = [0.3, 0.3, 0.4]),
'B' : np.random.choice(range(4), size = N, p = [0.25, .25, .25, .25]),
'W' : np.abs(np.random.normal(loc = 1, scale = 10, size = N))
})
dt_2 = pd.DataFrame({
'A' : np.random.choice(range(3), size = N, p = [0.1, 0.1, 0.8]),
'B' : np.random.choice(range(4), size = N, p = [0.5, .2, .1, .2]),
'W' : np.abs(np.random.normal(loc = 1, scale = 10, size = N))
})
dt = pd.concat([dt_1, dt_2], axis = 0)
dt['W'] = dt['W'].div(dt['W'].sum()).mul(len(dt))
crosstab = dt.groupby("A").apply(lambda g:
g.groupby("B").apply(lambda sg:
round(100 * (sg['W'].sum() / g['W'].sum()), 1)
)
).reset_index(drop=True)
crosstab = crosstab.T
crosstab.columns.name = "A"
```
```