*Bounty: 50*

*Bounty: 50*

Let $mathbf{X}$ denote a dataset generated by the joint cdf $F$, and let $theta = T(F)$ denote a parameter of interest obtained by the functional $T$. Let $hat{theta}$ denote an estimator of $theta$, and consider the null hypothesis $theta = theta_0$.

My understanding of applying the bootstrap to this null hypothesis is that we should bootstrap statistics $hat{theta}^* – hat{theta}$, form a confidence interval using the bootstrapped statistics, centre it on $theta_0$, and then check whether the interval spans $hat{theta}$. Importantly (for my question), this approach resamples the original underlying data $mathbf{X}$.

However, I have seen several applied papers that take a somewhat different approach. These papers apply an appropriate transformation to the dataset $mathbf{X}$ to obtain a dataset $mathbf{X}_0$ that conforms to the null hypothesis, but otherwise (hopefully) preserves the statistical properties of $mathbf{X}$. The author then bootstraps $mathbf{X}_0$ to obtain $hat{theta}_0^*$, constructs a confidence interval using $hat{theta}_0^*$, and then checks whether it spans $hat{theta}$. Implicitly, I guess the idea is that the bootstrapped confidence interval converges to the true coverage probabilities *under the null*.

For something simple like a null hypothesis on the mean of univariate data, it is pretty obvious that the two approaches discussed above are equivalent (since the relevant transformation will just be a shift to the left or right). However, I’ve seen more complicated examples in the literature where things aren’t so obvious. For example, in one case for a multivariate dataset, the author orthogonalized the matrix $mathbf{X}$ to obtain $mathbf{X}_0$ and then bootstrapped a range of different (and quite complicated) statistics on $mathbf{X}_0$ to conduct hypothesis tests on (the absence of) relationships in the original dataset (obviously there is a hidden multivariate Gaussian assumption here given the use of orthogonalization). While it seems intuitive, it isn’t immediately obvious (perhaps just to me) that the bootstrapped confidence intervals will be converging to appropriate limits.

Another (perhaps silly) example would be a null hypothesis of the variance of univariate data equal to 1. In this case, following the data transform method, we would use $mathbf{X}_0 = hat{sigma}^{-1} mathbf{X}$, and then resample $mathbf{X}_0$. I think this approach would fail because the statistic is not pivotal under the alternative. Perhaps it would work if we bootstrapped the usual chi-square statistic using $mathbf{X}_0$?

In summary, when is it valid (or invalid) to transform your data to conform to the null hypothesis? Obviously things break down if your statistic is not asymptotically pivotal under the null and alternative, but are there further conditions needed for this approach to work?