So the other day I had fun providing a new solution to this famous question. In the analysis part I showed that my little algorithm has space complexity:
Ω(log(k)). However, my rough logic says that we should be able to prove a tighter bound of
O(min(k,log(n))), but I was unable to prove it. Moreover it seems like an interesting, general-case problem.
To save time for the reader I have changed my space complexity problem to an equivalent complexity problem (the complexity of a function [not of space or time]):
Array of size
k; where elements,
e, are unique integers:
0 <= e < n, for some
average (rounded down) of the elements.
Randomly discard either all the elements
> than the average, or all the elements
<= the average.
We then do the same thing to the reduced array size.
We keep repeating the above steps until the array is size 1.
O(min(k,log(n))) on the number of times
average is counted? and if so, how do we prove it? (NOTE that we don’t care about the time it takes to calculate the average or remove the elements. This is because this questions is deliberately designed to be equivalent to my space-complexity problem.)
It seems really intuitive that it’s
O(min(k,log(n))) because if
n, then taking the
average will always divide the elements in half. However, I can’t seem to prove that it doesn’t sometimes perform worse when
k < n.
Thinking about this a bit we recognise that having outliers is really what makes for bad performance, so I try to imagine a worst-case scenario:
array = [1, 2, ..., (x-2), (x-1), (n-1)], where
(x-1) < average < (n-1)
In this scenario, we in worst-case reduce the array size to
x-1; however, by increasing
k by adding any number of elements,
(x-1) < e < (n-1), the worst-case array size (for the next iteration) is still
average = (x(x-1)/2 + n-1)/x
Thinking about the complexity, for a given
n, worst case
x = average
x^2 - x^2/2 + x/2 = n-1
O(x) = O(sqrt(n))
x represents the biggest sub-size.
So in this artificial "worst case" scenario, we get O(sub-array size) =
O(min(k,sqrt(n))), which kind of implies that the overall complexity =
O(min(k,log(sqrt(n)))) = O(min(k,log(n))).
However, this proof is quite informal and I’m not sure how to show that this genuinely represents the worst-case scenario.