## #StackBounty: #python-3.x #deployment #virtualenv Best practice of python virtual environment deployment

### Bounty: 50

Sorry if this was answered in another post.

I am very new to python and learning about the virtual environment. I understand that I am supposed to have all the libraries installed in the virtual environment and create the requirement.txt, so others can install using that. However, I am not sure what the best practice to deploy to production?

The reason I ask is that no one supposes to have access to the production environment, the deployment is through a predefined pipeline, and my understanding is that it will zip all my code and deploy it to production, no one suppose to go into production to do any manual work. I can try to get the pipeline to run a script to install all the libraries base on the requirement.txt, but I am not sure if the firewall setting is the same. Should I package the libraries as well?

also, how should I trigger the python script? should I have a wrapper script to activate the vevn before calling python script and deactivate it after? or there is a easier way?

Get this bounty!!!

## #StackBounty: #python-3.x #django #image #google-image-search How can I refine my Python reverse image search to limit to a specific do…

### Bounty: 200

I’m using Python 3.8. I have the following script for launching a Google reverse image search …

``````    filePath = '/tmp/cat_and_dog.webp'
multipart = {'encoded_image': (filePath, open(filePath, 'rb')), 'image_content': '', }
response = requests.post(searchUrl, files=multipart, allow_redirects=False)
webbrowser.open(fetchUrl)
``````

Does anyone how, if possible, I can refine the search to a specific domain?

Get this bounty!!!

## #StackBounty: #python #python-3.x #machine-learning #autocomplete #artificial-intelligence Is there a way to reset Kite Autocomplete&#3…

### Bounty: 50

I’m wondering if I can reset the training of Kite’s AI that uses my code. I want to do this because I want to change my code style and there is some stuff that I quit doing.

Take `xrange` for example; it’s deprecated in python3 (I’m a python coder). So, I want to reset all of the data it learned from me as if I just got it again. I don’t want to uninstall and reinstall it.

Is uninstalling the Sublime Text/Atom plugins and reinstalling them would do the trick? Or is it not possible?

And for the specs, I got a MacOS Catalina (10.15.5 (19F96)), non-pro and no account for Kite, and Kite version 0.20200609.2.

I really want to know if there’s an official way, not some file removing magic.

But if some file removing magic is necessary, then I’m fine.

Also, I wonder if just removing and reinstalling the plugins for editors would do the trick…

Someone set a bounty on this; I don’t wanna.

Get this bounty!!!

## #StackBounty: #python #python-3.x #pandas #dataframe #data-science Better way to iterate over dataset and change a feature value for ot…

### Bounty: 50

I have a dataset of velocities registered by sensors on highways and I’m changing the `label` values for the `avg5` (velocities average of 5 minutes timestamp) 2 hours in the future (the normal is 30 minutes. The `label` value of now is the observed `avg5` of 30 minutes in the future).

My dataset have the following features and values:

And I’m doing this switch of values by this way:

``````hours_added = datetime.timedelta(hours = 2)

for index in data_copy.index:

result = data_copy[((data_copy["timestamp5"] == hours_ahead) & (data_copy["sensor_id"] == data_copy["sensor_id"].loc[index]))]

if len(result) == 1:
data_copy.at[index, "label"] = result["avg5"]

if(index % 50 == 0):
print(f"Index: {index}")
``````

The code is querying 2 hours ahead and catching the result for the same sensor_id that I’m iterating now. I only change the value of my label if the result brings me something (`len(result) == 1)`.

My dataframe has 2950521 indexes and at the moment I’m publishing this question the kernel is running for more then 24 hours and only reached the 371650 Index.

So I started thinking that I’m doing something wrong or if have a better way of change these values who don’t take so long time.

The desired behavior is to assign the `avg5` of the respective sensor_id of 2 hours in the future for the label 2 hours before.
Let’s take as example the two images from this question and suppose that instead of 2 hours I want to assign the `avg5` of 10 minutes later in future (the sensor_id in this example are the same).

So the `label` of the row with index 0 instead of be 50.79 should be 51.59 (`avg5` value of the row with index 2).

Get this bounty!!!

## #StackBounty: #python-3.x #pandas #dataframe #formatting dataframe: transform row-based transaction data into aggregates per date

### Bounty: 50

I retrieve data from a SQLITE Database (and transform it to a pandas dataframe) in the following format:

``````Driver | Date loading | Date unloading | Loading Adress | Unloading Address
Peter  | 02.05.2020   | 03.05.2020     | 12342, Berlin  | 14221, Utrecht
Peter  | 03.05.2020   | 04.05.2020     | 14221, Utrecht | 13222, Amsterdam
Franz  | 03.05.2020   | 03.05.2020     | 11111, Somewher| 11221, Somewhere2
Franz  | 03.05.2020   | 05.05.2020     | 11223, Upsalla | 14231, Berlin
``````

The date range can be specified for the query, so that it gives an overview over which driver has which transports to deliver within the specified date range, ordered by date.

The goal of the transformation I want to do is a weekly plan for each driver, with the dates from the range sorted in the available columns. So for the data above, this would look like the following:

``````Driver | 02.05.2020           | 03.05.2020            | 04.05.2020         | 05.05.2020      |
12342, Berlin          14221, Utrecht          13222, Amsterdam
14221, Utrecht

11111, Somewher                              14231, Berlin
11221, Somewhere2
11223, Upsalla
``````

Is there any way to achieve the described output with dataframe operations? Within the single data columns I will need to keep the order, which is loading first, unloading second, and then go to the next data row if the date is the same.

Get this bounty!!!

## #StackBounty: #python #python-3.x #random #graph Test the hypothesis that the expected number of edges of a random connected graph is …

### Bounty: 50

Motivation

The most common model for a random graph is the Erdős–Rényi model. However, it does not guarantee the connectedness of the graph. Instead, let’s consider the following algorithm (in python-style pseudocode) for generating a random connected graph with $$n$$ nodes:

``````g = empty graph

while not g.is_connected:
i, j = random combination of two (distinct) nodes in range(n)
if {i, j} not in g.edges:

return g
``````

The graph generated this way is guaranteed to be connected. Now, my intuition tells me that its expected number of edges is of the order $$O(n log n)$$, and I want to test my hypothesis in Python. I don’t intend to do a rigorous mathematical proof or a comprehensive statistical inference, just some basic graph plotting.

The Codes

In order to know whether a graph is connected, we need a partition structure (i.e. union-find). I first wrote a `Partition` class in the module `partition.py`. It uses path compression and union by weights:

``````# partition.py

class Partition:
"""Implement a partition of a set of items to disjoint subsets (groups) as
a forest of trees, in which each tree represents a separate group.
Two trees represent the same group if and only if they have the same root.
Support union operation of two groups.
"""

def __init__(self, items):
items = list(items)

# parents of every node in the forest
self._parents = {item: item for item in items}

# the sizes of the subtrees
self._weights = {item: 1 for item in items}

def __len__(self):
return len(self._parents)

def __contains__(self, item):
return item in self._parents

def __iter__(self):
yield from self._parents

def find(self, item):
"""Return the root of the group containing the given item.
Also reset the parents of all nodes along the path to the root.
"""
if self._parents[item] == item:
return item
else:
# find the root and recursively set all parents to it
root = self.find(self._parents[item])
self._parents[item] = root
return root

def union(self, item1, item2):
"""Merge the two groups (if they are disjoint) containing
the two given items.
"""
root1 = self.find(item1)
root2 = self.find(item2)

if root1 != root2:
if self._weights[root1] < self._weights[root2]:
# swap two roots so that root1 becomes heavier
root1, root2 = root2, root1

# root1 is heavier, reset parent of root2 to root1
# also update the weight of the tree at root1
self._parents[root2] = root1
self._weights[root1] += self._weights[root2]

@property
def is_single_group(self):
"""Return true if all items are contained in a single group."""
# we just need one item, any item is ok
item = next(iter(self))

# group size is the weight of the root
group_size = self._weights[self.find(item)]
return group_size == len(self)
``````

Next, since we are only interested in the number of edges, we don’t actually need to explicitly construct any graph object. The following function implicitly generates a random connected graph and return its number of edges:

``````import random
from partition import Partition

def connected_edge_count(n):
"""Implicitly generate a random connected graph and return its number of edges."""
edges = set()
forest = Partition(range(n))

# each time we join two nodes we merge the two groups containing them
# the graph is connected iff the forest of nodes form a single group
while not forest.is_single_group:
start = random.randrange(n)
end = random.randrange(n)

# we don't bother to check whether the edge already exists
if start != end:
forest.union(start, end)
edge = frozenset({start, end})

return len(edges)
``````

We then estimate the expected number of edges for a given $$n$$:

``````def mean_edge_count(n, sample_size):
"""Compute the sample mean of numbers of edges in a sample of
random connected graphs with n nodes.
"""
total = sum(connected_edge_count(n) for _ in range(sample_size))
``````

Now, we can plot the expected numbers of edges against $$n log n$$ for different values of $$n$$:

``````from math import log
import matplotlib.pyplot as plt

def plt_mean_vs_nlogn(nlist, sample_size):
"""Plot the expected numbers of edges against n * log(n) for
a given list of values of n, where n is the number of nodes.
"""
x_values = [n * log(n) for n in nlist]
y_values = [mean_edge_count(n, sample_size) for n in nlist]
plt.plot(x_values, y_values, '.')
``````

Finally, when we called `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`, we got:

The plot seems very close to a straight line, supporting my hypothesis.

Questions and ideas for future work

1. My program is slow! It took me 90 seconds to run `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`. How can I improve the performance?
2. What other improvement can I make on my codes?
3. An idea for future work: do a linear regression on the data. A high coefficient of determination would support my hypothesis. Also find out the coefficient of $$n log n$$.
4. Any other idea for testing my hypothesis programmatically?

Get this bounty!!!

## #StackBounty: #python #python-3.x #random #graph Test the hypothesis that the expected number of edges of a random connected graph is …

### Bounty: 50

Motivation

The most common model for a random graph is the Erdős–Rényi model. However, it does not guarantee the connectedness of the graph. Instead, let’s consider the following algorithm (in python-style pseudocode) for generating a random connected graph with $$n$$ nodes:

``````g = empty graph

while not g.is_connected:
i, j = random combination of two (distinct) nodes in range(n)
if {i, j} not in g.edges:

return g
``````

The graph generated this way is guaranteed to be connected. Now, my intuition tells me that its expected number of edges is of the order $$O(n log n)$$, and I want to test my hypothesis in Python. I don’t intend to do a rigorous mathematical proof or a comprehensive statistical inference, just some basic graph plotting.

The Codes

In order to know whether a graph is connected, we need a partition structure (i.e. union-find). I first wrote a `Partition` class in the module `partition.py`. It uses path compression and union by weights:

``````# partition.py

class Partition:
"""Implement a partition of a set of items to disjoint subsets (groups) as
a forest of trees, in which each tree represents a separate group.
Two trees represent the same group if and only if they have the same root.
Support union operation of two groups.
"""

def __init__(self, items):
items = list(items)

# parents of every node in the forest
self._parents = {item: item for item in items}

# the sizes of the subtrees
self._weights = {item: 1 for item in items}

def __len__(self):
return len(self._parents)

def __contains__(self, item):
return item in self._parents

def __iter__(self):
yield from self._parents

def find(self, item):
"""Return the root of the group containing the given item.
Also reset the parents of all nodes along the path to the root.
"""
if self._parents[item] == item:
return item
else:
# find the root and recursively set all parents to it
root = self.find(self._parents[item])
self._parents[item] = root
return root

def union(self, item1, item2):
"""Merge the two groups (if they are disjoint) containing
the two given items.
"""
root1 = self.find(item1)
root2 = self.find(item2)

if root1 != root2:
if self._weights[root1] < self._weights[root2]:
# swap two roots so that root1 becomes heavier
root1, root2 = root2, root1

# root1 is heavier, reset parent of root2 to root1
# also update the weight of the tree at root1
self._parents[root2] = root1
self._weights[root1] += self._weights[root2]

@property
def is_single_group(self):
"""Return true if all items are contained in a single group."""
# we just need one item, any item is ok
item = next(iter(self))

# group size is the weight of the root
group_size = self._weights[self.find(item)]
return group_size == len(self)
``````

Next, since we are only interested in the number of edges, we don’t actually need to explicitly construct any graph object. The following function implicitly generates a random connected graph and return its number of edges:

``````import random
from partition import Partition

def connected_edge_count(n):
"""Implicitly generate a random connected graph and return its number of edges."""
edges = set()
forest = Partition(range(n))

# each time we join two nodes we merge the two groups containing them
# the graph is connected iff the forest of nodes form a single group
while not forest.is_single_group:
start = random.randrange(n)
end = random.randrange(n)

# we don't bother to check whether the edge already exists
if start != end:
forest.union(start, end)
edge = frozenset({start, end})

return len(edges)
``````

We then estimate the expected number of edges for a given $$n$$:

``````def mean_edge_count(n, sample_size):
"""Compute the sample mean of numbers of edges in a sample of
random connected graphs with n nodes.
"""
total = sum(connected_edge_count(n) for _ in range(sample_size))
``````

Now, we can plot the expected numbers of edges against $$n log n$$ for different values of $$n$$:

``````from math import log
import matplotlib.pyplot as plt

def plt_mean_vs_nlogn(nlist, sample_size):
"""Plot the expected numbers of edges against n * log(n) for
a given list of values of n, where n is the number of nodes.
"""
x_values = [n * log(n) for n in nlist]
y_values = [mean_edge_count(n, sample_size) for n in nlist]
plt.plot(x_values, y_values, '.')
``````

Finally, when we called `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`, we got:

The plot seems very close to a straight line, supporting my hypothesis.

Questions and ideas for future work

1. My program is slow! It took me 90 seconds to run `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`. How can I improve the performance?
2. What other improvement can I make on my codes?
3. An idea for future work: do a linear regression on the data. A high coefficient of determination would support my hypothesis. Also find out the coefficient of $$n log n$$.
4. Any other idea for testing my hypothesis programmatically?

Get this bounty!!!

## #StackBounty: #python #python-3.x #random #graph Test the hypothesis that the expected number of edges of a random connected graph is …

### Bounty: 50

Motivation

The most common model for a random graph is the Erdős–Rényi model. However, it does not guarantee the connectedness of the graph. Instead, let’s consider the following algorithm (in python-style pseudocode) for generating a random connected graph with $$n$$ nodes:

``````g = empty graph

while not g.is_connected:
i, j = random combination of two (distinct) nodes in range(n)
if {i, j} not in g.edges:

return g
``````

The graph generated this way is guaranteed to be connected. Now, my intuition tells me that its expected number of edges is of the order $$O(n log n)$$, and I want to test my hypothesis in Python. I don’t intend to do a rigorous mathematical proof or a comprehensive statistical inference, just some basic graph plotting.

The Codes

In order to know whether a graph is connected, we need a partition structure (i.e. union-find). I first wrote a `Partition` class in the module `partition.py`. It uses path compression and union by weights:

``````# partition.py

class Partition:
"""Implement a partition of a set of items to disjoint subsets (groups) as
a forest of trees, in which each tree represents a separate group.
Two trees represent the same group if and only if they have the same root.
Support union operation of two groups.
"""

def __init__(self, items):
items = list(items)

# parents of every node in the forest
self._parents = {item: item for item in items}

# the sizes of the subtrees
self._weights = {item: 1 for item in items}

def __len__(self):
return len(self._parents)

def __contains__(self, item):
return item in self._parents

def __iter__(self):
yield from self._parents

def find(self, item):
"""Return the root of the group containing the given item.
Also reset the parents of all nodes along the path to the root.
"""
if self._parents[item] == item:
return item
else:
# find the root and recursively set all parents to it
root = self.find(self._parents[item])
self._parents[item] = root
return root

def union(self, item1, item2):
"""Merge the two groups (if they are disjoint) containing
the two given items.
"""
root1 = self.find(item1)
root2 = self.find(item2)

if root1 != root2:
if self._weights[root1] < self._weights[root2]:
# swap two roots so that root1 becomes heavier
root1, root2 = root2, root1

# root1 is heavier, reset parent of root2 to root1
# also update the weight of the tree at root1
self._parents[root2] = root1
self._weights[root1] += self._weights[root2]

@property
def is_single_group(self):
"""Return true if all items are contained in a single group."""
# we just need one item, any item is ok
item = next(iter(self))

# group size is the weight of the root
group_size = self._weights[self.find(item)]
return group_size == len(self)
``````

Next, since we are only interested in the number of edges, we don’t actually need to explicitly construct any graph object. The following function implicitly generates a random connected graph and return its number of edges:

``````import random
from partition import Partition

def connected_edge_count(n):
"""Implicitly generate a random connected graph and return its number of edges."""
edges = set()
forest = Partition(range(n))

# each time we join two nodes we merge the two groups containing them
# the graph is connected iff the forest of nodes form a single group
while not forest.is_single_group:
start = random.randrange(n)
end = random.randrange(n)

# we don't bother to check whether the edge already exists
if start != end:
forest.union(start, end)
edge = frozenset({start, end})

return len(edges)
``````

We then estimate the expected number of edges for a given $$n$$:

``````def mean_edge_count(n, sample_size):
"""Compute the sample mean of numbers of edges in a sample of
random connected graphs with n nodes.
"""
total = sum(connected_edge_count(n) for _ in range(sample_size))
``````

Now, we can plot the expected numbers of edges against $$n log n$$ for different values of $$n$$:

``````from math import log
import matplotlib.pyplot as plt

def plt_mean_vs_nlogn(nlist, sample_size):
"""Plot the expected numbers of edges against n * log(n) for
a given list of values of n, where n is the number of nodes.
"""
x_values = [n * log(n) for n in nlist]
y_values = [mean_edge_count(n, sample_size) for n in nlist]
plt.plot(x_values, y_values, '.')
``````

Finally, when we called `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`, we got:

The plot seems very close to a straight line, supporting my hypothesis.

Questions and ideas for future work

1. My program is slow! It took me 90 seconds to run `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`. How can I improve the performance?
2. What other improvement can I make on my codes?
3. An idea for future work: do a linear regression on the data. A high coefficient of determination would support my hypothesis. Also find out the coefficient of $$n log n$$.
4. Any other idea for testing my hypothesis programmatically?

Get this bounty!!!

## #StackBounty: #python #python-3.x #random #graph Test the hypothesis that the expected number of edges of a random connected graph is …

### Bounty: 50

Motivation

The most common model for a random graph is the Erdős–Rényi model. However, it does not guarantee the connectedness of the graph. Instead, let’s consider the following algorithm (in python-style pseudocode) for generating a random connected graph with $$n$$ nodes:

``````g = empty graph

while not g.is_connected:
i, j = random combination of two (distinct) nodes in range(n)
if {i, j} not in g.edges:

return g
``````

The graph generated this way is guaranteed to be connected. Now, my intuition tells me that its expected number of edges is of the order $$O(n log n)$$, and I want to test my hypothesis in Python. I don’t intend to do a rigorous mathematical proof or a comprehensive statistical inference, just some basic graph plotting.

The Codes

In order to know whether a graph is connected, we need a partition structure (i.e. union-find). I first wrote a `Partition` class in the module `partition.py`. It uses path compression and union by weights:

``````# partition.py

class Partition:
"""Implement a partition of a set of items to disjoint subsets (groups) as
a forest of trees, in which each tree represents a separate group.
Two trees represent the same group if and only if they have the same root.
Support union operation of two groups.
"""

def __init__(self, items):
items = list(items)

# parents of every node in the forest
self._parents = {item: item for item in items}

# the sizes of the subtrees
self._weights = {item: 1 for item in items}

def __len__(self):
return len(self._parents)

def __contains__(self, item):
return item in self._parents

def __iter__(self):
yield from self._parents

def find(self, item):
"""Return the root of the group containing the given item.
Also reset the parents of all nodes along the path to the root.
"""
if self._parents[item] == item:
return item
else:
# find the root and recursively set all parents to it
root = self.find(self._parents[item])
self._parents[item] = root
return root

def union(self, item1, item2):
"""Merge the two groups (if they are disjoint) containing
the two given items.
"""
root1 = self.find(item1)
root2 = self.find(item2)

if root1 != root2:
if self._weights[root1] < self._weights[root2]:
# swap two roots so that root1 becomes heavier
root1, root2 = root2, root1

# root1 is heavier, reset parent of root2 to root1
# also update the weight of the tree at root1
self._parents[root2] = root1
self._weights[root1] += self._weights[root2]

@property
def is_single_group(self):
"""Return true if all items are contained in a single group."""
# we just need one item, any item is ok
item = next(iter(self))

# group size is the weight of the root
group_size = self._weights[self.find(item)]
return group_size == len(self)
``````

Next, since we are only interested in the number of edges, we don’t actually need to explicitly construct any graph object. The following function implicitly generates a random connected graph and return its number of edges:

``````import random
from partition import Partition

def connected_edge_count(n):
"""Implicitly generate a random connected graph and return its number of edges."""
edges = set()
forest = Partition(range(n))

# each time we join two nodes we merge the two groups containing them
# the graph is connected iff the forest of nodes form a single group
while not forest.is_single_group:
start = random.randrange(n)
end = random.randrange(n)

# we don't bother to check whether the edge already exists
if start != end:
forest.union(start, end)
edge = frozenset({start, end})

return len(edges)
``````

We then estimate the expected number of edges for a given $$n$$:

``````def mean_edge_count(n, sample_size):
"""Compute the sample mean of numbers of edges in a sample of
random connected graphs with n nodes.
"""
total = sum(connected_edge_count(n) for _ in range(sample_size))
``````

Now, we can plot the expected numbers of edges against $$n log n$$ for different values of $$n$$:

``````from math import log
import matplotlib.pyplot as plt

def plt_mean_vs_nlogn(nlist, sample_size):
"""Plot the expected numbers of edges against n * log(n) for
a given list of values of n, where n is the number of nodes.
"""
x_values = [n * log(n) for n in nlist]
y_values = [mean_edge_count(n, sample_size) for n in nlist]
plt.plot(x_values, y_values, '.')
``````

Finally, when we called `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`, we got:

The plot seems very close to a straight line, supporting my hypothesis.

Questions and ideas for future work

1. My program is slow! It took me 90 seconds to run `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`. How can I improve the performance?
2. What other improvement can I make on my codes?
3. An idea for future work: do a linear regression on the data. A high coefficient of determination would support my hypothesis. Also find out the coefficient of $$n log n$$.
4. Any other idea for testing my hypothesis programmatically?

Get this bounty!!!

## #StackBounty: #python #python-3.x #random #graph Test the hypothesis that the expected number of edges of a random connected graph is …

### Bounty: 50

Motivation

The most common model for a random graph is the Erdős–Rényi model. However, it does not guarantee the connectedness of the graph. Instead, let’s consider the following algorithm (in python-style pseudocode) for generating a random connected graph with $$n$$ nodes:

``````g = empty graph

while not g.is_connected:
i, j = random combination of two (distinct) nodes in range(n)
if {i, j} not in g.edges:

return g
``````

The graph generated this way is guaranteed to be connected. Now, my intuition tells me that its expected number of edges is of the order $$O(n log n)$$, and I want to test my hypothesis in Python. I don’t intend to do a rigorous mathematical proof or a comprehensive statistical inference, just some basic graph plotting.

The Codes

In order to know whether a graph is connected, we need a partition structure (i.e. union-find). I first wrote a `Partition` class in the module `partition.py`. It uses path compression and union by weights:

``````# partition.py

class Partition:
"""Implement a partition of a set of items to disjoint subsets (groups) as
a forest of trees, in which each tree represents a separate group.
Two trees represent the same group if and only if they have the same root.
Support union operation of two groups.
"""

def __init__(self, items):
items = list(items)

# parents of every node in the forest
self._parents = {item: item for item in items}

# the sizes of the subtrees
self._weights = {item: 1 for item in items}

def __len__(self):
return len(self._parents)

def __contains__(self, item):
return item in self._parents

def __iter__(self):
yield from self._parents

def find(self, item):
"""Return the root of the group containing the given item.
Also reset the parents of all nodes along the path to the root.
"""
if self._parents[item] == item:
return item
else:
# find the root and recursively set all parents to it
root = self.find(self._parents[item])
self._parents[item] = root
return root

def union(self, item1, item2):
"""Merge the two groups (if they are disjoint) containing
the two given items.
"""
root1 = self.find(item1)
root2 = self.find(item2)

if root1 != root2:
if self._weights[root1] < self._weights[root2]:
# swap two roots so that root1 becomes heavier
root1, root2 = root2, root1

# root1 is heavier, reset parent of root2 to root1
# also update the weight of the tree at root1
self._parents[root2] = root1
self._weights[root1] += self._weights[root2]

@property
def is_single_group(self):
"""Return true if all items are contained in a single group."""
# we just need one item, any item is ok
item = next(iter(self))

# group size is the weight of the root
group_size = self._weights[self.find(item)]
return group_size == len(self)
``````

Next, since we are only interested in the number of edges, we don’t actually need to explicitly construct any graph object. The following function implicitly generates a random connected graph and return its number of edges:

``````import random
from partition import Partition

def connected_edge_count(n):
"""Implicitly generate a random connected graph and return its number of edges."""
edges = set()
forest = Partition(range(n))

# each time we join two nodes we merge the two groups containing them
# the graph is connected iff the forest of nodes form a single group
while not forest.is_single_group:
start = random.randrange(n)
end = random.randrange(n)

# we don't bother to check whether the edge already exists
if start != end:
forest.union(start, end)
edge = frozenset({start, end})

return len(edges)
``````

We then estimate the expected number of edges for a given $$n$$:

``````def mean_edge_count(n, sample_size):
"""Compute the sample mean of numbers of edges in a sample of
random connected graphs with n nodes.
"""
total = sum(connected_edge_count(n) for _ in range(sample_size))
``````

Now, we can plot the expected numbers of edges against $$n log n$$ for different values of $$n$$:

``````from math import log
import matplotlib.pyplot as plt

def plt_mean_vs_nlogn(nlist, sample_size):
"""Plot the expected numbers of edges against n * log(n) for
a given list of values of n, where n is the number of nodes.
"""
x_values = [n * log(n) for n in nlist]
y_values = [mean_edge_count(n, sample_size) for n in nlist]
plt.plot(x_values, y_values, '.')
``````

Finally, when we called `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`, we got:

The plot seems very close to a straight line, supporting my hypothesis.

Questions and ideas for future work

1. My program is slow! It took me 90 seconds to run `plt_mean_vs_nlogn(range(10, 1001, 10), sample_size=100)`. How can I improve the performance?
2. What other improvement can I make on my codes?
3. An idea for future work: do a linear regression on the data. A high coefficient of determination would support my hypothesis. Also find out the coefficient of $$n log n$$.
4. Any other idea for testing my hypothesis programmatically?

Get this bounty!!!