## #StackBounty: #hypothesis-testing #p-value #entropy Relation between P-value in a randomness test, number of samples, and entropy

### Bounty: 100

Consider tests of randomness of bit sequences of fixed size \$n\$ bits as in cryptography (e.g. NIST Special Publication 800-22 page 1-4). Define such test as any deterministic function \$T\$ that accepts a vector \$V\$ of \$n\$ bits, and outputs a P-value \$P\$ in \$]0dots1]\$, obeying the defining property
\$\$forallalphain[0dots1],;;Prbig(T(V)lealphabig),le,alpha\$\$
where the probability is computed with \$V\$ a vector of random independent unbiased bits (or equivalently, is computed as the proportion of \$V\$ such that \$T(V)lealpha\$ among the \$2^n\$ vectors \$V\$).

Example tests matching this definition are

• True, which always output \$P=1\$.
• Non-zero, which outputs \$1/2^n\$ if all bits in \$V\$ are zero, and outputs \$1\$ otherwise.
• Non-stuck, which outputs \$1/2^{n-1}\$ if all bits in \$V\$ are identical, and outputs \$1\$ otherwise.
• Balanced, which computes the number \$s\$ of bits set in \$V\$, and outputs the odds that for random \$V\$, \$|2s-n|\$ is at least as observed.
• For \$nle3\$, Balanced is the same as Non-stuck.
• For \$n=4\$, \$P=begin{cases}
{1/8}&text{ if } sin{0,4}\
{5/8}&text{ if } sin{1,3}\
1&text{ otherwise}end{cases}\$
• For \$n=5\$, \$P=begin{cases}
{1/16}&text{ if } sin{0,5}\
{3/8}&text{ if } sin{1,4}\
1&text{ otherwise}end{cases}\$

There’s a natural partial order relation among tests: \$T\$ implies \$T’\$ when \$forall V, T(V)le T'(V)\$. Any test implies True. Balanced implies Non-stuck, but does not imply Non-zero. Some tests, including Balanced and Non-zero, are optimal in the sense that no other test implies them.

Section 2 of the above reference describes 15 tests for large \$n\$ (thousands bits), that are intended to catch some defects relevant to actual random number generators, and be near-optimal (in the above sense). For example, section 2.1 is an approximation of Balanced for large \$n\$ using the complementary error function, designated The Frequency (Monobit) Test.

Q1: Assume that all bits tested are random independent bits having same odds \$q={1over2}+epsilon\$ to be set, with \$epsilon\$ unknown (besides being smallish), corresponding to Shannon entropy per bit \$\$H=-qlog_2(q)-(1-q)log_2(1-q)=1-{2overlog2}epsilon^2+mathcal O(epsilon^4)\$\$

The Balanced test for some (large) number \$n\$ of such bits is applied once, and outputs a small P-value (say \$Ple0.001\$). That allows us to reject the null hypothesis \$H=1\$ with high confidence (corresponding to the P-value \$P\$).

What is a tight function \$H(P,n)\$ such that we can reject \$Hge H(P,n)\$ with some good confidence (corresponding to some known P-value higher than \$P\$, perhaps \$2P\$ or something on that tune)? By “tight function” I mean that the lowest \$H(P,n)\$ we manage to prove for some confidence, the better.

Q2: Things are as in Q1, except that the test is unspecified beyond the defining property of P-values. Can we reject the hypothesis \$Hge H(P,n)\$ with good confidence, for whatever \$H(P,n)\$ and confidence level was established in Q1? If that conjecture was false, what’s a counterexample, or/and is that reparable?

Q3: Things are as in Q2 (or in Q1 if the property thought in Q2 does not apply), except that the bits in the input \$V\$ might be dependent, but still with Shannon entropy per bit \$H\$; that is, the distribution of the inputs \$V\$ is such that \$\$nH;=;-sum_{Vtext{ with }Pr(V)ne0}Pr(V)log_2(Pr(V))\$\$
Can we reject the hypothesis \$Hge H(P,n)\$ with good confidence, for whatever \$H(P,n)\$ and confidence level was established in Q1? If that conjecture was false, what’s a counterexample, or/and is that reparable?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" \$Pi\$ using an observed p-value

### Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

\$\$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)\$\$

in which \$t_{n-1}\$ is the student \$t\$ quantile based on a specific value of a parameter \$Pi\$ (\$0<Pi<1)\$. For the sake of the illustration, we could suppose that \$Pi\$ is 0.025.

When performing a one-sided \$t\$ test of the null hypothesis \$H_0: mu = 100\$ on a sample taken from that population, the expected \$p\$ value is \$Pi\$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

1. Is the \$p\$ value a maximum likelihood estimator (MLE) of \$Pi\$? (Conjecture: yes, because it is based on a \$t\$ statistic which is based on a likelihood ratio test);

2. Is the \$p\$ value a biased estimator of \$Pi\$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many \$p\$s is close to \$Pi\$ but the mean value of many \$p\$s is much larger);

3. Is the \$p\$ value a minimum variance estimate of \$Pi\$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

4. Can we get a confidence interval around a given \$p\$ value by using the confidence interval of the observed \$t\$ value (this is done using the non-central student \$t\$ distribution with degree of freedom \$n-1\$ and non-centrality parameter \$t\$) and computing the \$p\$ values of the lower and upper bound \$t\$ values? (Conjecture: yes because both the non-central student \$t\$ quantiles and the \$p\$ values of a one-sided test are continuous increasing functions)

Get this bounty!!!