*Bounty: 50*

*Bounty: 50*

In run length encoding (RLE) the code stream consists of pairs $(c_i,ell_i)$, which is understood as writing the character $c_i$ repeatedly $ell_i$ times.

Consider the following “improvement” of the run length encoding. In run length eXtreme (RLX), the code stream consists of tuples $(c_i, p_i,l_i)$, where now $p_i$ specifies a position.

Given a code stream $(c_1,p_1,l_1)cdots(c_K,p_K,l_K)$ we start with an infinite all 0 string and for $k=1,ldots,K$ write the character $c_k$ to positions $[p_k,p_k+l_k-1)$ overwriting whatever may have been written in these positions before. At the end we output the length

$$n=max_k (p_k + l_k -1)$$

prefix of this string.

For simplicity let us not worry about the bit length of the code words and take each code word as unit cost. Given a string $s$, let RLX(s) denote minimum number of code words that produce $s$ when decoded as described above.

How fast can we calculate RLX(s)?

Let us fix a string $s$ which does not contain any 0s and denote by $r(i,j)=mathrm{RLX}(s[i..j])$. We can write a recurrence for $r$ as so:

begin{align}

r(i,j) &= minbegin{cases}

1 + r(i+1,j)\

min_{i<kle j: s[i]=s[k]}r(i+1,k-1)+r(k, j)

end{cases}

end{align}

To see this, consider the optimal encoding $E$ of $s[i..j]$. Either $s[i]$ has a dedicated code word or multiple characters in the final string $s$ are produced by the code word which produced $s[i]$.

If $s[i]$ has a dedicated code word in E, clearly $r(i,j) = 1+r(i+1,j)$. Otherwise, let $k$ be the index of the leftmost character produced by the same code word. Then we have $r(i,j) = r(i+1,k-1) + r(k,j)$.

Taking this recurrence literally leads to an obvious $O(n^3)$ dynamic programming algorithm.

Is there a faster algorithm for calculating RLX via a custom algorithm or a dynamic programming optimization?

## Submodular interval functions:

An interval function maps two integers $ile j$ to a real. An interval function is called submodular (aka concave, Inverse-QI, Inverse-Monge, etc.) if

for $ile i’le jle j’$ we have

$$f(i,j’) + f(i’,j) le f(i,j) + f(i’,j’).$$

Note that the $r$ we have defined above satisfies $r(i,j)le r(i,k) + r(k+1,j)$ so one may hope r is submodular, at least when restricted to indices of the same character. But even such restricted forms of submodularity is false. Consider:

```
/* 5
* 111
* 44444
* 3333333
* 222222222 234
* 111111111111111
*/
s = 123415143212341
```

We have `r(1,11) + r(7,15) = 6+5`

but `r(7,11) + r(1,15) = 4 + 9`

.

Further the $r(cdot,cdot)$ is not a totally monotone matrix either.

Are there other dynamic programming optimization that may still apply to this problem?