#StackBounty: #hypothesis-testing #bootstrap When can we transform our data to conform to the null hypothesis when bootstrapping?

Bounty: 50

Let $$mathbf{X}$$ denote a dataset generated by the joint cdf $$F$$, and let $$theta = T(F)$$ denote a parameter of interest obtained by the functional $$T$$. Let $$hat{theta}$$ denote an estimator of $$theta$$, and consider the null hypothesis $$theta = theta_0$$.

My understanding of applying the bootstrap to this null hypothesis is that we should bootstrap statistics $$hat{theta}^* – hat{theta}$$, form a confidence interval using the bootstrapped statistics, centre it on $$theta_0$$, and then check whether the interval spans $$hat{theta}$$. Importantly (for my question), this approach resamples the original underlying data $$mathbf{X}$$.

However, I have seen several applied papers that take a somewhat different approach. These papers apply an appropriate transformation to the dataset $$mathbf{X}$$ to obtain a dataset $$mathbf{X}_0$$ that conforms to the null hypothesis, but otherwise (hopefully) preserves the statistical properties of $$mathbf{X}$$. The author then bootstraps $$mathbf{X}_0$$ to obtain $$hat{theta}_0^*$$, constructs a confidence interval using $$hat{theta}_0^*$$, and then checks whether it spans $$hat{theta}$$. Implicitly, I guess the idea is that the bootstrapped confidence interval converges to the true coverage probabilities under the null.

For something simple like a null hypothesis on the mean of univariate data, it is pretty obvious that the two approaches discussed above are equivalent (since the relevant transformation will just be a shift to the left or right). However, I’ve seen more complicated examples in the literature where things aren’t so obvious. For example, in one case for a multivariate dataset, the author orthogonalized the matrix $$mathbf{X}$$ to obtain $$mathbf{X}_0$$ and then bootstrapped a range of different (and quite complicated) statistics on $$mathbf{X}_0$$ to conduct hypothesis tests on (the absence of) relationships in the original dataset (obviously there is a hidden multivariate Gaussian assumption here given the use of orthogonalization). While it seems intuitive, it isn’t immediately obvious (perhaps just to me) that the bootstrapped confidence intervals will be converging to appropriate limits.

Another (perhaps silly) example would be a null hypothesis of the variance of univariate data equal to 1. In this case, following the data transform method, we would use $$mathbf{X}_0 = hat{sigma}^{-1} mathbf{X}$$, and then resample $$mathbf{X}_0$$. I think this approach would fail because the statistic is not pivotal under the alternative. Perhaps it would work if we bootstrapped the usual chi-square statistic using $$mathbf{X}_0$$?

In summary, when is it valid (or invalid) to transform your data to conform to the null hypothesis? Obviously things break down if your statistic is not asymptotically pivotal under the null and alternative, but are there further conditions needed for this approach to work?

Get this bounty!!!

#StackBounty: #correlation #bootstrap #ordinal-data Bootstrapping ordinal correlations? How to deal with the effect of duplicated obser…

Bounty: 50

When dealing with ordinal correlations (e.g. Spearman’s Rho, Kendall’s Tau), one can non-parametrically test the null hypothesis of no correlation by a random permutation test (shuffling one of the two variables).

However, if we have a significantly non-zero correlation and we’d like to obtain a confidence interval for the correlation coefficient we need an estimate of its standard error that doesn’t rely on the null hypothesis.

Bootstrapping seems to the way to go, but then comes the problem of duplicated observations. The bootstrapped samples contain duplicated observations due to resampling with replacement, and these duplicates generate ties that affect the ordinal correlations measures. For example, if one uses Kendall’s Tau a, these spurious ties will decrease the correlation coefficient (i.e., introduce a negative bias).

How to deal with this problem? will a BCA (bias corrected and accelerated) confidence interval be appropriate here ? or alternatively, should we use a different resampling approach (e.g., jackknifing) instead?

Get this bounty!!!

#StackBounty: #confidence-interval #references #bootstrap #multiple-comparisons #bonferroni Multiple comparisons correction for depende…

Bounty: 300

In this blog post the authors discuss simultaneously estimating quantiles, and constructing a simultaneous confidence envelope for the estimation which covers the whole quantile function. They do this by bootstrapping and then computing pointwise bootstrap confidence intervals and applying a Bonferroni type correction for multiple comparisons. Since the comparisons are not independent, they compute something like an effective number of independent trials according to a formula

$$N_{eq}=frac{N^2}{sum_{i,j}r(b_i,b_j)}$$

where $$N$$ is the number of points to be estimated and $$r(b_i,b_j)$$ is the sample correlation between the $$ith$$ and $$j$$th bootstrap vectors.

My question is where this formula comes from. They provide a link to a source, but I don’t see this formula in the source. Is anyone aware of this particular correction being used in the literature?

Get this bounty!!!