#StackBounty: #python #python-3.x #random #graph Test the hypothesis that the expected number of edges of a random connected graph is …

Bounty: 50

Motivation

The most common model for a random graph is the Erdős–Rényi model. However, it does not guarantee the connectedness of the graph. Instead, let’s consider the following algorithm (in python-style pseudocode) for generating a random connected graph with $$n$$ nodes:

``````g = empty graph

while not g.is_connected:
i, j = random combination of two (distinct) nodes in range(n)
if {i, j} not in g.edges:

return g
``````

The graph generated this way is guaranteed to be connected. Now, my intuition tells me that its expected number of edges is of the order $$O(n log n)$$, and I want to test my hypothesis in Python. I don’t intend to do a rigorous mathematical proof or a comprehensive statistical inference, just some basic graph plotting.

The Codes

In order to know whether a graph is connected, we need a partition structure (i.e. union-find). I first wrote a `Partition` class in the module `partition.py`. It uses path compression and union by weights:

``````# partition.py

class Partition:
"""Implement a partition of a set of items to disjoint subsets (groups) as
a forest of trees, in which each tree represents a separate group.
Two trees represent the same group if and only if they have the same root.
Support union operation of two groups.
"""

def __init__(self, items):
items = list(items)

# parents of every node in the forest
self._parents = {item: item for item in items}

# the sizes of the subtrees
self._weights = {item: 1 for item in items}

def __len__(self):
return len(self._parents)

def __contains__(self, item):
return item in self._parents

def __iter__(self):
yield from self._parents

def find(self, item):
"""Return the root of the group containing the given item.
Also reset the parents of all nodes along the path to the root.
"""
if self._parents[item] == item:
return item
else:
# find the root and recursively set all parents to it
root = self.find(self._parents[item])
self._parents[item] = root
return root

def union(self, item1, item2):
"""Merge the two groups (if they are disjoint) containing
the two given items.
"""
root1 = self.find(item1)
root2 = self.find(item2)

if root1 != root2:
if self._weights[root1] < self._weights[root2]:
# swap two roots so that root1 becomes heavier
root1, root2 = root2, root1

# root1 is heavier, reset parent of root2 to root1
# also update the weight of the tree at root1
self._parents[root2] = root1
self._weights[root1] += self._weights[root2]

@property
def is_single_group(self):
"""Return true if all items are contained in a single group."""
# we just need one item, any item is ok
item = next(iter(self))

# group size is the weight of the root
group_size = self._weights[self.find(item)]
return group_size == len(self)
``````

Next, since we are only interested in the number of edges, we don’t actually need to explicitly construct any graph object. The following function implicitly generates a random connected graph and return its number of edges:

``````import random
from partition import Partition

def connected_edge_count(n):
"""Implicitly generate a random connected graph and return its number of edges."""
edges = set()
forest = Partition(range(n))

# each time we join two nodes we merge the two groups containing them
# the graph is connected iff the forest of nodes form a single group
while not forest.is_single_group:
start = random.randrange(n)
end = random.randrange(n)

# we don't bother to check whether the edge already exists
if start != end:
forest.union(start, end)
edge = frozenset({start, end})

return len(edges)
``````

We then estimate the expected number of edges for a given $$n$$:

``````def mean_edge_count(n, sample_size):
"""Compute the sample mean of numbers of edges in a sample of
random connected graphs with n nodes.
"""
total = sum(connected_edge_count(n) for _ in range(sample_size))
``````

Now, we can plot the expected numbers of edges against $$n log n$$ for different values of $$n$$:

``````from math import log
import matplotlib.pyplot as plt

def plt_mean_vs_nlogn(nlist, sample_size):
"""Plot the expected numbers of edges against n * log(n) for
a given list of values of n, where n is the number of nodes.
"""
x_values = [n * log(n) for n in nlist]
y_values = [mean_edge_count(n, sample_size) for n in nlist]
plt.plot(x_values, y_values, '.')
``````

Finally, when we called `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`, we got:

The plot seems very close to a straight line, supporting my hypothesis.

Questions and ideas for future work

1. My program is slow! It took me 90 seconds to run `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`. How can I improve the performance?
2. What other improvement can I make on my codes?
3. An idea for future work: do a linear regression on the data. A high coefficient of determination would support my hypothesis. Also find out the coefficient of $$n log n$$.
4. Any other idea for testing my hypothesis programmatically?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.