*Bounty: 50*

I was reading Elad Hazan’s book on Online Convex Optimization(http://ocobook.cs.princeton.edu/OCObook.pdf) and am facing difficulty understanding the proof given for the No regret algorithm for MAB (Pg 102-103). It would be great if someone can provide a clarification on this.

The No-regret (sub-linear) algorithm is given in Algorithm 17 (Pg 102), and proof for no regret is shown in lemma 6.1 . I will make the description as self-contained as possible, but a more detailed presentation can be found in pgs 102-103 in the book mentioned above.

Let $mathcal{K}=left[1, n right]$ denote the set of experts(arms). Let $i_t in mathcal{K}$, denote the expert chosen by the algorithm in the $t^{th}$ round. Let $l_t(i_t)$, denote he loss function provided by the adversary in the $t^{th}$ round. Since we are dealing with the bandit setting, we do not know $l_t(i)$ for all $i in mathcal{K}backslash i_t$. Further, the loss functions ($l_t$) are assumed to be bounded between 0 and 1. The way the algorithm works is, at each round we flip a coin, with bias $delta$. If the outcome of the coin is heads, the algorithm chooses one of the actions i.e. $i_t$, and constructs a estimate of $l_t$ as follows:

begin{equation}

hat{l}_t =

begin{cases}

frac{n}{delta} l_t(i_t), text{ if } i = i_t\

0, text{ otherwise}

end{cases}

end{equation}

If however, the outcome of the coin toss was tails, then it simply sets $hat{l}_t = 0$.

It can be easily shown that by using the above scheme we have $Eleft[ hat{l}_t(i)right] = l_t(i), , forall i in mathcal{K}$.

The regret as shown in the book is as follows.

begin{eqnarray}

label{Eq:1}

E[regret_T] &=& Eleft[sum_{t=1}^{T} l_t(i_t) – sum_{t=1}^{T}l_t(i^*)right] \*

label{Eq:2}

& leq& Eleft[sum_{t not in S_T} hat{l}*t(i_t) – sum*{t not in S_T} hat{l}_t(i^) + sum_{t in S_T} 1 right]

end{eqnarray}

where $S_T subseteq [1, T]$ denotes the round in which the coin toss was heads, and $i^* = underset{i in mathcal{K}}{text{arg min}} sum_{t=1}^{T} l_t(i)$ . I am having a hard time showing why $E[regret_T] leq Eleft[sum_{t not in S_T} hat{l}(i_t) – sum_{t not in S_t} hat{l}*t(i^*) + sum*{t in S_T} 1 right] $. The comment in the book is that $i^*$ is independent of $hat{l}_t$, hence validity of inequality. I did not understand what that was supposed to mean.

**My Attempt:**

For my attempt, I will be using the some of the notation used in the proof of that book.

We know that $underset{i in [1, dotsc, n]}{text{min}} sum_{t=1}^T hat{l}*t (i) leq sum*{t=1}^T hat{l}*t (i), , forall i in [1,n]$**. Applying $E$ (expectation) on both sides we get,*

begin{eqnarray}

Eleft[ underset{i in [1, dotsc, n]}{text{min}} sum{t=1}^T hat{l}*t (i) right] &leq & E left[sum*{t=1}^T hat{l}*t (i) right], , forall i in [1,n] \*

implies Eleft[ underset{i in [1, dotsc, n]}{text{min}} sum{t=1}^T hat{l}*t (i) right] &leq & underset{i in [1, dotsc, n]}{text{min}} E left[sum*{t=1}^T hat{l}*t (i) right] \*

&=& sum{t=1}^T l_t (i^*)

end{eqnarray}

, in the book it is easily shown that $Eleft[hat{l}(i)right] = l(i)$.

In view of the above inequality, we can show

begin{eqnarray}

E[regret_T] &=& Eleft[sum_{t=1}^{T} l(i_t) – sum_{t=1}^{T}l_t(i^*)right] \

& leq& Eleft[ sum_{t=1}^{T} hat{l}(i_t) – underset{i in [1, dotsc, n]}{text{min}} sum_{t=1}^T hat{l}_t (i) right]

end{eqnarray}

Clearly, I am making some mistakes in the attempt above. I will be grateful if someone can point me to those and help clarify the reasoning in the book. I say my attempt is incorrect because, in the way I have shown, I completely disregarded the set $S_T$. Without this set, the book shows the regret to be $mathcal{O}(sqrt{T})$, whereas with the $S_T$ set the regret is shown to be $mathcal{O}(T^{frac{3}{4}})$

Get this bounty!!!