#StackBounty: #optimization #dynamic-programming Run Length eXtreme encoded length

Bounty: 50

In run length encoding (RLE) the code stream consists of pairs $(c_i,ell_i)$, which is understood as writing the character $c_i$ repeatedly $ell_i$ times.

Consider the following “improvement” of the run length encoding. In run length eXtreme (RLX), the code stream consists of tuples $(c_i, p_i,l_i)$, where now $p_i$ specifies a position.
Given a code stream $(c_1,p_1,l_1)cdots(c_K,p_K,l_K)$ we start with an infinite all 0 string and for $k=1,ldots,K$ write the character $c_k$ to positions $[p_k,p_k+l_k-1)$ overwriting whatever may have been written in these positions before. At the end we output the length
$$n=max_k (p_k + l_k -1)$$
prefix of this string.

For simplicity let us not worry about the bit length of the code words and take each code word as unit cost. Given a string $s$, let RLX(s) denote minimum number of code words that produce $s$ when decoded as described above.

How fast can we calculate RLX(s)?

Let us fix a string $s$ which does not contain any 0s and denote by $r(i,j)=mathrm{RLX}(s[i..j])$. We can write a recurrence for $r$ as so:
r(i,j) &= minbegin{cases}
1 + r(i+1,j)\
min_{i<kle j: s[i]=s[k]}r(i+1,k-1)+r(k, j)

To see this, consider the optimal encoding $E$ of $s[i..j]$. Either $s[i]$ has a dedicated code word or multiple characters in the final string $s$ are produced by the code word which produced $s[i]$.

If $s[i]$ has a dedicated code word in E, clearly $r(i,j) = 1+r(i+1,j)$. Otherwise, let $k$ be the index of the leftmost character produced by the same code word. Then we have $r(i,j) = r(i+1,k-1) + r(k,j)$.

Taking this recurrence literally leads to an obvious $O(n^3)$ dynamic programming algorithm.

Is there a faster algorithm for calculating RLX via a custom algorithm or a dynamic programming optimization?

Submodular interval functions:

An interval function maps two integers $ile j$ to a real. An interval function is called submodular (aka concave, Inverse-QI, Inverse-Monge, etc.) if
for $ile i’le jle j’$ we have
$$f(i,j’) + f(i’,j) le f(i,j) + f(i’,j’).$$

Note that the $r$ we have defined above satisfies $r(i,j)le r(i,k) + r(k+1,j)$ so one may hope r is submodular, at least when restricted to indices of the same character. But even such restricted forms of submodularity is false. Consider:

/*       5
 *      111
 *     44444
 *    3333333
 *   222222222 234
 *  111111111111111

s = 123415143212341

We have r(1,11) + r(7,15) = 6+5 but r(7,11) + r(1,15) = 4 + 9.
Further the $r(cdot,cdot)$ is not a totally monotone matrix either.
Are there other dynamic programming optimization that may still apply to this problem?

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.