*Bounty: 50*

*Bounty: 50*

Typically, median of medians algorithm is written with groups of size $5$ or $7$ to ensure worst-case linear performance. The argument against groups of size $k=3$ is typically that we get a recurrence of:

$$begin{align*}
T(n) & leq T(n/3) + T(2n/3) + O(n)\
& = O(n log n)
end{align*}$$

However, this assumes that there exists an input which would always split the recurrence in the worst possible case. I am not convinced that this input always exists.

**Question**: Is there a general procedure to create a worst-case input for the median-of-medians algorithm with $k=3$ for an input of any length $n$?

If it helps, you can assume $n$ is of some particular format and that we are always searching for the exact median (as opposed to querying the $i$-th order statistic). You can also assume that the algorithm groups elements deterministically (e.g. incrementally group 3 elements).

You can choose any deterministic implementation if it helps, here is one for reference. Given an input array $A$ of $n$ distinct elements and query $i$ for the $i$-th order statistic, the $mathrm{SELECT}(A, i)$ procedure is as follows:

- Divide $A$ into $lfloor tfrac{n}{3} rfloor$ groups of length $3$.
- We can do this by putting ${a_1, a_2, a_3}$ as group $G_1$, then ${a_4, a_5, a_6}$ as group $G_2$, and so on…
- If $n neq 3k$ then we can create one more group with the remaining 1 or 2 elements.

- Find the median of each group $G_i$ and store them in a list $M$.
- We can do this by creating $M = [mathrm{med}(G_1), mathrm{med}(G_2), ldots ]$.

- Recursively call this function to get the median of medians $x = mathrm{SELECT}(M, lfloor tfrac{n}{6}rfloor)$.
- Partition $A$ about $x$ into $A_1$ and $A_2$.
- Let all elements in $A_1$ be less than $x$. (Assume order is retained)
- Let all elements in $A_2$ be greater than $x$. (Assume order is retained)

- If $i = |A_1| + 1$ return $x$
- If $i < |A_1| + 1$ return $mathrm{SELECT}(A_1, i)$.
- If $i > |A_1| + 1$ return $mathrm{SELECT}(A_2, i – |A_1| – 1)$.

The biggest issue I am having for constructing a worst case input, is that we need $A$, $M$, and $A_1$ or $A_2$ to exhibit worst-case behavior. Since $M$ is nested in $A$ and $A_1$ or $A_2$ are subsets of $A$, getting all of this coordinated is very difficult.

Additional Notes:

We can always show that the initial recurrence will split into a problem of size $n/3$ and $2n/3$, perhaps we can also do this for the next level down, however I doubt it can be done consistently for levels below. This is relevant and important, because if you can show that our second recurrence will in fact be on something less than $2n/3$, then we have a linear time algorithm! This is because we’d get a recurrence:

$$T(n) leq T(n/3) + T(alpha n) + O(n)$$

Where $alpha < 2/3$ thus, by the uneven split theorem this must be linear a la $T(n) = O(n)$.