#StackBounty: #subgaussianity #tail-bound Bounding the tail of sum of discrete distributions (via sub-gaussianity)

Bounty: 50

I have the following problem: we have a sequence of random variables $Z_1, …, Z_n$ which are summed up; let’s denote $X$ to be their sum. We observe a number $epsilon$ that is sampled from $X$ and we wish to somehow describe/bound the probability $P(X geq epsilon)$; that is, we want to compute the extremeness (quantile) of $epsilon$. We are interested only in the right tail of $X$.

Each of $Z_i$ is a discrete distribution with probabilities $p_i=[p_1,…p_k], k in N$ and corresponding values given as $x=[x_1,…,x_k]=[-log(p_1),…-log(p_k)]$ (we assume p > 0). The values of $p_i$ (which also give us the $x_i$) are known. The distributions $Z_1,…Z_n$ are independent (but not necessary identically distributed).

Since we know all $p_i$, we can compute the means and variances of $Z_i$. (The mean of $Z_i$ happens to be the entropy of $Z_i$, by the way how we defined the values $x_i$.) By summing the means and variances of $Z_i$, we get the mean $mu$ and variance $sigma^2$ of $X$.

To bound the tail probability of $X$, we can use Chebyshev inequality: we have $P(X geq epsilon) leq frac{sigma^2}{(epsilon – mu)^2}$. However, the bound is very loose and I am looking for a better one.
Also, in the limit for $n to infty$, the $X$ will be normally distributed by CLT; however, I will mostly deal with $n$ between 10-30, where the normal approximation is not so good.

For these reasons, I am trying to utilize the subgaussianity of $Z_i$. We say that random variable $Z$ is b-subgaussian, if
$$
mathbb{E}left[ explambda (Z – mathbb{E}[Z])right] leq exp left( frac{lambda^2b^2}{2} right) quad forall lambda in mathbb{R}
$$

The subgaussianity is preserved through summation (if $Z_1$ is $b_1$-subgaussian and $Z_2$ is $b_2$-subgaussian, then $Z_1 + Z_2$ is $sqrt{b_1^2 + b_2^2}$ subgaussian.) Therefore, if we compute $b_i,…b_n$ such that $Z_1,…,Z_n$ are $b_i,…b_n$-subgaussian, we can obtain $b$ such that $X$ is b-subgaussian.
The subgaussian property gives us a much stronger bound on the tail of $X$ compared to the Chebyshev inequality: if $X$ is $b-$subgaussian, then
$$
P(X geq epsilon) leq exp left( frac{-(epsilon-mu)^2}{2b^2}right)
$$

That is, lower values of $b$ are better (give us tighter bound).
Any distribution with values bounded by interval $[c,d]$ is $b-$subgaussian for $b geq frac{(d-c)^2}{4}$ by Hoeffding’s lemma. However, this bound is again too lose for my usecase.

That is, I would like to ask:

  • Is there some algorithm, that given the discrete probability distribution $p=[p_1,…p_k]$ and corresponding values $x$ (in my case, I have $x=-log(p)$) computes $b$ such that the distribution is $b-$subgaussian for reasonably low $b$?
  • Alternatively, is there some algorithm, that given my values of $p$ and $x$ and given some $b$ decides whether the distribution is $b-$subgaussian? (I can then use it to find the lowest $b$ by binary search).
  • After all, I try to bound the tail probability of $X$ – maybe the subgaussianity approach is not a good idea? Are there any better / reasonable ways how to obtain a tight bound?

P.S. Another way how to estimate the probability in the tail is by Monte-Carlo sampling from $X$ (which I can, since I know all $Z_i$). However, this will be computationally too expensive for me. In my use case, I know all possible values of $Z$ in advance (there may be about 100 possible values of vectors $p$, and I know them in advance) and I can precompute even computationally expensive stuff. Then, I am given the actual sequence of $Z$s (the sequence will be about 10-30 of $Z$s selected from the possible values of $Z$ that were given in advance), and I need to compute the bound on the tail of $X$ as fast as possible.

References:

I got the introduction to subgaussianity from Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms (2020), chapter 5. The book is available here.

Another resourse I went through is this lecture notes (it mention the usage Hoeffidng’s lemma).

edit: this material states an equivalent condition for subgaussianity, that can be checked algorithmically (Theorem 2.1, page 6 of the .pdf). I am currently trying to understand the proof and somehow construct an algorithm that decide if distribution is $sigma$-gaussian for given $sigma$.


Get this bounty!!!