#StackBounty: #hypothesis-testing #random-variable #small-sample #discrete-data #marginal Hypothesis test for discrete random vector wi…

Bounty: 50

Consider a random vector $X in mathbb{R}^{d}$ with support $text{supp}(X) = {1,2,3,4}^d$, and let $P_X$ denote its known probability mass function. Note that $lvert text{supp}(X) rvert = 4^d$.

I have $n ll 4^d$ i.i.d. samples of $X$, and I suspect that these are not distributed according to $P_X$. The following are my questions:

  1. Is it more appropriate to make the null hypothesis that the distribution of the samples is not $P_X$, or should I only consider the alternative to be the null hypothesis? Typically, hypothesis tests seem to make the null hypothesis that the distributions are the same.
  2. What should be my hypothesis test, considering that my sample size $n$ is far less than the support of $X$?
  3. Related to question 2, suppose $d = 4$ and $n = 50$. Then I have $50$ samples but my random vector $X$ can take $256$ possible values. In this setting, it seems difficult to even estimate the distribution from which the $50$ samples are generated with reasonable accuracy. However, I can estimate, e.g., the probability $mathbb{P}(X_4 = 1)$ from the sample with reasonably high accuracy and compare it to the true value from the marginal of $P_X$. Therefore, it looks like I can potentially use such marginal tests to not reject my hypothesis that the sample distribution is not $P_X$. This feels ad hoc, though – there could be a whole bunch of other tests I can use that are not captured by testing marginals.

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.