#StackBounty: #python #python-3.x #random #graph Test the hypothesis that the expected number of edges of a random connected graph is …

Bounty: 50


The most common model for a random graph is the Erdős–Rényi model. However, it does not guarantee the connectedness of the graph. Instead, let’s consider the following algorithm (in python-style pseudocode) for generating a random connected graph with $n$ nodes:

g = empty graph

while not g.is_connected:
    i, j = random combination of two (distinct) nodes in range(n)
    if {i, j} not in g.edges:
        g.add_edge(i, j)

return g

The graph generated this way is guaranteed to be connected. Now, my intuition tells me that its expected number of edges is of the order $ O(n log n) $, and I want to test my hypothesis in Python. I don’t intend to do a rigorous mathematical proof or a comprehensive statistical inference, just some basic graph plotting.

The Codes

In order to know whether a graph is connected, we need a partition structure (i.e. union-find). I first wrote a Partition class in the module partition.py. It uses path compression and union by weights:

# partition.py

class Partition:
    """Implement a partition of a set of items to disjoint subsets (groups) as
    a forest of trees, in which each tree represents a separate group.
    Two trees represent the same group if and only if they have the same root.
    Support union operation of two groups.

    def __init__(self, items):
        items = list(items)

        # parents of every node in the forest
        self._parents = {item: item for item in items}

        # the sizes of the subtrees
        self._weights = {item: 1 for item in items}

    def __len__(self):
        return len(self._parents)

    def __contains__(self, item):
        return item in self._parents

    def __iter__(self):
        yield from self._parents

    def find(self, item):
        """Return the root of the group containing the given item.
        Also reset the parents of all nodes along the path to the root.
        if self._parents[item] == item:
            return item
            # find the root and recursively set all parents to it
            root = self.find(self._parents[item])
            self._parents[item] = root
            return root

    def union(self, item1, item2):
        """Merge the two groups (if they are disjoint) containing
        the two given items.
        root1 = self.find(item1)
        root2 = self.find(item2)

        if root1 != root2:
            if self._weights[root1] < self._weights[root2]:
                # swap two roots so that root1 becomes heavier
                root1, root2 = root2, root1

            # root1 is heavier, reset parent of root2 to root1
            # also update the weight of the tree at root1
            self._parents[root2] = root1
            self._weights[root1] += self._weights[root2]

    def is_single_group(self):
        """Return true if all items are contained in a single group."""
        # we just need one item, any item is ok
        item = next(iter(self))

        # group size is the weight of the root
        group_size = self._weights[self.find(item)]
        return group_size == len(self)

Next, since we are only interested in the number of edges, we don’t actually need to explicitly construct any graph object. The following function implicitly generates a random connected graph and return its number of edges:

import random
from partition import Partition

def connected_edge_count(n):
    """Implicitly generate a random connected graph and return its number of edges."""
    edges = set()
    forest = Partition(range(n))

    # each time we join two nodes we merge the two groups containing them
    # the graph is connected iff the forest of nodes form a single group
    while not forest.is_single_group:
        start = random.randrange(n)
        end = random.randrange(n)

        # we don't bother to check whether the edge already exists
        if start != end:
            forest.union(start, end)
            edge = frozenset({start, end})

    return len(edges)

We then estimate the expected number of edges for a given $n$:

def mean_edge_count(n, sample_size):
    """Compute the sample mean of numbers of edges in a sample of
    random connected graphs with n nodes.
    total = sum(connected_edge_count(n) for _ in range(sample_size))
    return total / sample_size

Now, we can plot the expected numbers of edges against $ n log n $ for different values of $n$:

from math import log
import matplotlib.pyplot as plt

def plt_mean_vs_nlogn(nlist, sample_size):
    """Plot the expected numbers of edges against n * log(n) for
    a given list of values of n, where n is the number of nodes.
    x_values = [n * log(n) for n in nlist]
    y_values = [mean_edge_count(n, sample_size) for n in nlist]
    plt.plot(x_values, y_values, '.')

Finally, when we called plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100), we got:

enter image description here

The plot seems very close to a straight line, supporting my hypothesis.

Questions and ideas for future work

  1. My program is slow! It took me 90 seconds to run plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100). How can I improve the performance?
  2. What other improvement can I make on my codes?
  3. An idea for future work: do a linear regression on the data. A high coefficient of determination would support my hypothesis. Also find out the coefficient of $ n log n $.
  4. Any other idea for testing my hypothesis programmatically?

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.