#StackBounty: #data-structures #randomized-algorithms #correctness-proof Proof of Randomized Self-Adjusting Binary Search Tree

Bounty: 50

I developed a randomized self-adjusting binary search tree years ago, which I called a shuffle tree, but was unable to ever have it published because my proofs were rejected (with little explanation). I’ve since given up the hope of publishing (I’m not an academic so it doesn’t matter so much), but perhaps I can have some closure: I’m going to present the tree here, and perhaps someone can help me understand where my proofs fall short? Through testing, I’m quite certain that my understanding of the data structure is correct, but the proofs were always lacking.

First, understand how a top-down splay tree can be implemented around a traverse() function. Shuffle trees can be implemented similarly, where all operations defer to a traverse() function for the balancing operation.

I’m going to begin with a C traverse function for shuffle trees, then I’ll explain:

// returns node with key k,
// or returns the leaf containing
// the closest key to k.
node *  Traverse( key k, node *root, int treesize ) {
    signed int iCounter = rand() % treesize;
    node *pRet = 0;
    node *p = root;
    while ( p ) {
        pRet = p;
        if ( k < value(p) ) {
            p = left(p);
            if (( ! iCounter )&& p ) {
                RotateRight( pRet );
                pRet = parent(p);
            } // end if
        } else if ( value(p) < k ) {
            p = right(p);
            if (( ! iCounter )&& p ) {
                RotateLeft( pRet );
                pRet = parent(p);
            } // end if
        } else
            break ; // break while
        --iCounter;
        iCounter >>= 1;
    } // end while
    return ( pRet );
} 

The rotations used are simple single rotations.

Like a scapegoat tree, shuffle trees sample depth to find imbalance, but unlike scapegoat trees, they execute at most one rotation per access to attempt to restore balance.

At the beginning of traversal, we set an integer count-down value, to a random number in the range [0,N-1], where N is the size of the tree. As we iterate from a parent node to its child, we decrease the counter with I := (I-1)/2. When the counter equals zero, then the current node becomes a candidate rotation pivot. If we need to iterate past the candidate pivot, then we will commit to the rotation. We rotate the pivot away from the direction of traversal.

As search depth increases, the likelihood of a rotation increases. No rotations will occur beyond depth lgN. The counter requires lgN random bits per operation. Shuffle trees also record their
size, so that the counter can be set. No balancing
information needs to be recorded in tree nodes.
Searches may not navigate to a leaf; if the working
set is clustered near the root, then deep searches
will not be required. As a result, rotations can occur less frequently in a well-configured tree.
If a node in the tree is not weight-balanced, then
an access is more likely to traverse into its larger
sub-tree. A rotation at the node probably moves some of the descendants from the larger sub-tree to
the smaller one. Since these operations are probabilistic, rotations will occur which can deteriorate
balance; but as the imbalance increases, the likelihood of a rotation that improves balance
increases.
In effect, the balancing technique is a kind of random sampling. Nodes are selected randomly
from traversal paths and their balance is manipulated. Frequently used data attract more attention
and, therefore, benefit more from balancing activity than infrequently used data. The tree
eventually approximates a weight-blanced configura
tion for the data set, where the probability of
access is the weight for each node.

Here’s where it gets dicey: proving that a traversal occurs in lgN.

I argue the probability that an adversary can select a node
to force a rotation that impairs balance is

(1 - Pw) * Product over A of ( 1 - px )

Where Pw is a number estimating the overall weight balance of the tree (Pw >= 0.5) and px is the weight of node x, the probability it is accessed. Set A is the set containing the pivot and all its ancestors.

If rotations occur which impair balance, and Pw increases, then the overall probability of a favorable rotation increases. As Pw increases, the probability of a favorable rotation dwarfs the probability of a poor rotation. The tree does not linearize, and lgN is maintained.

Eh. What do you think?


Get this bounty!!!