*Bounty: 150*

*Bounty: 150*

I’ve tried to find an article that explains the procedure of permutation tests for the exhaustive sampling of all permutations (not the monte carlo method) and couldn’t find a resource that was specific enough to help me with the ambiguity outlined below. For example, the Wikipedia article (https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests) says

For example, given a combined dataset (1, 2, 3) where group A has length 2 and group B has length 1 for simplicity, it is not clear to me whether “all possible ways to divide it” means {(1, 2), (3)} and {(2, 1), (3), …} or if they count “{(1, 2), (3)}” and “{(2, 1), (3)}” as the same division.

Looking at various code examples, for example, the Python, R, Julia, etc. examples on https://rosettacode.org/wiki/Permutation_test, I see that the permutation test is usually implemented as follows:

Given two samples A and B

- Record the test statistic (e.g., $|bar{A} – bar{B}|$)
- Combine samples A and B into one large sample AB
- for combination of length(A) from AB:

3a) compute permutation statistic (e.g., $|bar{A’} – bar{B’}|$, where A’ is the combination from 3. and B’ are all the samples in AB that are not in A’).

3b) record permutation statistic - Compute p-value as the proportion the permutation statistic from 3a) was more extreme than the test statistic from 1. divided by the number of combinations sampled

However, shouldn’t we be sampling the permutations instead of combinations of length A? For example, as outlined below (I highlighted the difference to the previous procedure in bold):

Given two samples A and B

- Record the test statistic (e.g., $|bar{A} – bar{B}|$)
- Combine samples A and B into one large sample AB
**for permutation of length(AB) from AB:**

3a) compute permutation statistic (e.g., $|bar{A’} – bar{B’}|$, where A’ are the first len(A) samples in the permuted AB sequence and B’ are the remaining samples in AB- Compute p-value as the proportion the permutation statistic from 3a) was more extreme than the test statistic from 1. divided by the number of permutations sampled

Or, to provide a simple numeric example, consider the following 2 samples

a = [1, 3]

b = [2]

with the observed difference:

obs = |mean(a) – mean(b)| = 2

Using the “combinations” procedure, we would be sampling the following:

(1, 2), (3) => diff 0

(1, 3), (2) => diff 2

(2, 3), (1) => diff 4

where in 2 out of 3 cases, we would observe a difference equal or more extreme than in the observed statistic (i.e., p=2/3)

Now, using permutations, we would get the following:

(1, 2), (3) => diff 0

(1, 3), (2) => diff 2

(2, 1), (3) => diff 0

(2, 3), (1) => diff 4

(3, 1), (2) => diff 2

(3, 2), (1) => diff 4

Here, we observe a difference that is equal or more extreme than the observed statistic in 4 out of 6 cases (p=4/6)

Does anyone no more about the exact procedure and has a reliable resource at hand? Thanks!