## #StackBounty: #subgaussianity #tail-bound Bounding the tail of sum of discrete distributions (via sub-gaussianity)

### Bounty: 50

I have the following problem: we have a sequence of random variables $$Z_1, …, Z_n$$ which are summed up; let’s denote $$X$$ to be their sum. We observe a number $$epsilon$$ that is sampled from $$X$$ and we wish to somehow describe/bound the probability $$P(X geq epsilon)$$; that is, we want to compute the extremeness (quantile) of $$epsilon$$. We are interested only in the right tail of $$X$$.

Each of $$Z_i$$ is a discrete distribution with probabilities $$p_i=[p_1,…p_k], k in N$$ and corresponding values given as $$x=[x_1,…,x_k]=[-log(p_1),…-log(p_k)]$$ (we assume p > 0). The values of $$p_i$$ (which also give us the $$x_i$$) are known. The distributions $$Z_1,…Z_n$$ are independent (but not necessary identically distributed).

Since we know all $$p_i$$, we can compute the means and variances of $$Z_i$$. (The mean of $$Z_i$$ happens to be the entropy of $$Z_i$$, by the way how we defined the values $$x_i$$.) By summing the means and variances of $$Z_i$$, we get the mean $$mu$$ and variance $$sigma^2$$ of $$X$$.

To bound the tail probability of $$X$$, we can use Chebyshev inequality: we have $$P(X geq epsilon) leq frac{sigma^2}{(epsilon – mu)^2}$$. However, the bound is very loose and I am looking for a better one.
Also, in the limit for $$n to infty$$, the $$X$$ will be normally distributed by CLT; however, I will mostly deal with $$n$$ between 10-30, where the normal approximation is not so good.

For these reasons, I am trying to utilize the subgaussianity of $$Z_i$$. We say that random variable $$Z$$ is b-subgaussian, if
$$mathbb{E}left[ explambda (Z – mathbb{E}[Z])right] leq exp left( frac{lambda^2b^2}{2} right) quad forall lambda in mathbb{R}$$
The subgaussianity is preserved through summation (if $$Z_1$$ is $$b_1$$-subgaussian and $$Z_2$$ is $$b_2$$-subgaussian, then $$Z_1 + Z_2$$ is $$sqrt{b_1^2 + b_2^2}$$ subgaussian.) Therefore, if we compute $$b_i,…b_n$$ such that $$Z_1,…,Z_n$$ are $$b_i,…b_n$$-subgaussian, we can obtain $$b$$ such that $$X$$ is b-subgaussian.
The subgaussian property gives us a much stronger bound on the tail of $$X$$ compared to the Chebyshev inequality: if $$X$$ is $$b-$$subgaussian, then
$$P(X geq epsilon) leq exp left( frac{-(epsilon-mu)^2}{2b^2}right)$$
That is, lower values of $$b$$ are better (give us tighter bound).
Any distribution with values bounded by interval $$[c,d]$$ is $$b-$$subgaussian for $$b geq frac{(d-c)^2}{4}$$ by Hoeffding’s lemma. However, this bound is again too lose for my usecase.

That is, I would like to ask:

• Is there some algorithm, that given the discrete probability distribution $$p=[p_1,…p_k]$$ and corresponding values $$x$$ (in my case, I have $$x=-log(p)$$) computes $$b$$ such that the distribution is $$b-$$subgaussian for reasonably low $$b$$?
• Alternatively, is there some algorithm, that given my values of $$p$$ and $$x$$ and given some $$b$$ decides whether the distribution is $$b-$$subgaussian? (I can then use it to find the lowest $$b$$ by binary search).
• After all, I try to bound the tail probability of $$X$$ – maybe the subgaussianity approach is not a good idea? Are there any better / reasonable ways how to obtain a tight bound?

P.S. Another way how to estimate the probability in the tail is by Monte-Carlo sampling from $$X$$ (which I can, since I know all $$Z_i$$). However, this will be computationally too expensive for me. In my use case, I know all possible values of $$Z$$ in advance (there may be about 100 possible values of vectors $$p$$, and I know them in advance) and I can precompute even computationally expensive stuff. Then, I am given the actual sequence of $$Z$$s (the sequence will be about 10-30 of $$Z$$s selected from the possible values of $$Z$$ that were given in advance), and I need to compute the bound on the tail of $$X$$ as fast as possible.

#### References:

I got the introduction to subgaussianity from Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms (2020), chapter 5. The book is available here.

Another resourse I went through is this lecture notes (it mention the usage Hoeffidng’s lemma).

edit: this material states an equivalent condition for subgaussianity, that can be checked algorithmically (Theorem 2.1, page 6 of the .pdf). I am currently trying to understand the proof and somehow construct an algorithm that decide if distribution is $$sigma$$-gaussian for given $$sigma$$.

Get this bounty!!!