*Bounty: 50*

*Bounty: 50*

Consider a random vector $X in mathbb{R}^{d}$ with support $text{supp}(X) = {1,2,3,4}^d$, and let $P_X$ denote its known probability mass function. Note that $lvert text{supp}(X) rvert = 4^d$.

I have $n ll 4^d$ i.i.d. samples of $X$, and I suspect that these are not distributed according to $P_X$. The following are my questions:

- Is it more appropriate to make the null hypothesis that the distribution of the samples is
*not*$P_X$, or should I only consider the alternative to be the null hypothesis? Typically, hypothesis tests seem to make the null hypothesis that the distributions are the same. - What should be my hypothesis test, considering that my sample size $n$ is far less than the support of $X$?
- Related to question 2, suppose $d = 4$ and $n = 50$. Then I have $50$ samples but my random vector $X$ can take $256$ possible values. In this setting, it seems difficult to even estimate the distribution from which the $50$ samples are generated with reasonable accuracy. However, I can estimate, e.g., the probability $mathbb{P}(X_4 = 1)$ from the sample with reasonably high accuracy and compare it to the true value from the marginal of $P_X$. Therefore, it looks like I can potentially use such marginal tests to not reject my hypothesis that the sample distribution is not $P_X$. This feels ad hoc, though – there could be a whole bunch of other tests I can use that are not captured by testing marginals.