## Context

This is somewhat similar to this question, but I do not think it is an exact duplicate.

When you look for how instructions on how to perform a bootstrap hypothesis test, it is usually stated that it is fine to use the empirical distribution for confidence intervals but that you need to correctly bootstrap from the distribution under the null hypothesis to get a p-value. As an example, see the accepted answer to this question. A general search on the internet mostly seems to turn up similar answers.

The reason for not using a p-value based on the empirical distribution is that most of the time we do not have translation invariance.

## Example

Let me give a short example. We have a coin and we want to do an one-sided test to see if the frequency of heads is larger than 0.5

We perform $$n = 20$$ trials and get $$k = 14$$ heads. The true p-value for this test would be $$p = 0.058$$.

On the other hand if we bootstrap our 14 out of 20 heads, we effectively sample from the binomial distribution with $$n = 20$$ and $$p = frac{14}{20}=0.7$$. Shifting this distribution by subtracting 0.2 we will get a barely significant result when testing our observed value of 0.7 against the obtained empirical distribution.

In this case the discrepancy is very small, but it gets larger when the success rate we test against gets close to 1.

## Question

Now let me come to the real point of my question: the very same defect also holds for confidence intervals. In fact, if a confidence interval has the stated confidence level $$alpha$$ then the confidence interval not containing the parameter under the null hypothesis is equivalent to rejecting the null hypothesis at a significance level of $$1- alpha$$.

Why is it that the confidence intervals based upon the empirical distribution are widely accepted and the p-value not?

Is there a deeper reason or are people just not as conservative with confidence intervals?

In this answer Peter Dalgaard gives an answer that seems to agree with my argument. He says:

least not (much) worse than the calculation of CI.

Where is the (much) coming from? It implies that generating p-values that way is slightly worse, but does not elaborate on the point.

## Final thoughts

Also in An Introduction to the Bootstrap by Efron and Tibshirani they dedicate a lot of space to the confidence intervals but not to p-values unless they are generated under a proper null hypothesis distribution, with the exception of one throwaway line about the general equivalence of confidence intervals and p-values in the chapter about permutation testing.

Let us also come back to the first question I linked. I agree with the answer by Michael Chernick, but again he also argues that both confidence intervals and p-values based on the empirical bootstrap distribution are equally unreliable in some scenarios. It does not explain why you find many people telling you that the intervals are ok, but the p-values are not.

Get this bounty!!!

## #StackBounty: #hypothesis-testing #bootstrap When can we transform our data to conform to the null hypothesis when bootstrapping?

### Bounty: 50

Let $$mathbf{X}$$ denote a dataset generated by the joint cdf $$F$$, and let $$theta = T(F)$$ denote a parameter of interest obtained by the functional $$T$$. Let $$hat{theta}$$ denote an estimator of $$theta$$, and consider the null hypothesis $$theta = theta_0$$.

My understanding of applying the bootstrap to this null hypothesis is that we should bootstrap statistics $$hat{theta}^* – hat{theta}$$, form a confidence interval using the bootstrapped statistics, centre it on $$theta_0$$, and then check whether the interval spans $$hat{theta}$$. Importantly (for my question), this approach resamples the original underlying data $$mathbf{X}$$.

However, I have seen several applied papers that take a somewhat different approach. These papers apply an appropriate transformation to the dataset $$mathbf{X}$$ to obtain a dataset $$mathbf{X}_0$$ that conforms to the null hypothesis, but otherwise (hopefully) preserves the statistical properties of $$mathbf{X}$$. The author then bootstraps $$mathbf{X}_0$$ to obtain $$hat{theta}_0^*$$, constructs a confidence interval using $$hat{theta}_0^*$$, and then checks whether it spans $$hat{theta}$$. Implicitly, I guess the idea is that the bootstrapped confidence interval converges to the true coverage probabilities under the null.

For something simple like a null hypothesis on the mean of univariate data, it is pretty obvious that the two approaches discussed above are equivalent (since the relevant transformation will just be a shift to the left or right). However, I’ve seen more complicated examples in the literature where things aren’t so obvious. For example, in one case for a multivariate dataset, the author orthogonalized the matrix $$mathbf{X}$$ to obtain $$mathbf{X}_0$$ and then bootstrapped a range of different (and quite complicated) statistics on $$mathbf{X}_0$$ to conduct hypothesis tests on (the absence of) relationships in the original dataset (obviously there is a hidden multivariate Gaussian assumption here given the use of orthogonalization). While it seems intuitive, it isn’t immediately obvious (perhaps just to me) that the bootstrapped confidence intervals will be converging to appropriate limits.

Another (perhaps silly) example would be a null hypothesis of the variance of univariate data equal to 1. In this case, following the data transform method, we would use $$mathbf{X}_0 = hat{sigma}^{-1} mathbf{X}$$, and then resample $$mathbf{X}_0$$. I think this approach would fail because the statistic is not pivotal under the alternative. Perhaps it would work if we bootstrapped the usual chi-square statistic using $$mathbf{X}_0$$?

In summary, when is it valid (or invalid) to transform your data to conform to the null hypothesis? Obviously things break down if your statistic is not asymptotically pivotal under the null and alternative, but are there further conditions needed for this approach to work?

Get this bounty!!!

## #StackBounty: #correlation #bootstrap #ordinal-data Bootstrapping ordinal correlations? How to deal with the effect of duplicated obser…

### Bounty: 50

When dealing with ordinal correlations (e.g. Spearman’s Rho, Kendall’s Tau), one can non-parametrically test the null hypothesis of no correlation by a random permutation test (shuffling one of the two variables).

However, if we have a significantly non-zero correlation and we’d like to obtain a confidence interval for the correlation coefficient we need an estimate of its standard error that doesn’t rely on the null hypothesis.

Bootstrapping seems to the way to go, but then comes the problem of duplicated observations. The bootstrapped samples contain duplicated observations due to resampling with replacement, and these duplicates generate ties that affect the ordinal correlations measures. For example, if one uses Kendall’s Tau a, these spurious ties will decrease the correlation coefficient (i.e., introduce a negative bias).

How to deal with this problem? will a BCA (bias corrected and accelerated) confidence interval be appropriate here ? or alternatively, should we use a different resampling approach (e.g., jackknifing) instead?

Get this bounty!!!

## #StackBounty: #confidence-interval #references #bootstrap #multiple-comparisons #bonferroni Multiple comparisons correction for depende…

### Bounty: 300

In this blog post the authors discuss simultaneously estimating quantiles, and constructing a simultaneous confidence envelope for the estimation which covers the whole quantile function. They do this by bootstrapping and then computing pointwise bootstrap confidence intervals and applying a Bonferroni type correction for multiple comparisons. Since the comparisons are not independent, they compute something like an effective number of independent trials according to a formula

$$N_{eq}=frac{N^2}{sum_{i,j}r(b_i,b_j)}$$

where $$N$$ is the number of points to be estimated and $$r(b_i,b_j)$$ is the sample correlation between the $$ith$$ and $$j$$th bootstrap vectors.

My question is where this formula comes from. They provide a link to a source, but I don’t see this formula in the source. Is anyone aware of this particular correction being used in the literature?

Get this bounty!!!