*Bounty: 500*

# Problem Statement

I am trying to find the intersection over union (IoU) metric for one to several rotated rectangles compared to many different rotated rectangles. Here are some images to help visualize this metric:

The second metric is quite close to the scenario I’m trying to calculate, the white area divided by the combined black and white area.

# Example

For each small box on the right we need to determine the IoU metric for each red box on the right. The output in this case will be an array of size (5, 756) since there are 5 boxes on the left with 756 IoU metrics for each box on the right.

# My solution

To solve this problem, I fill in each “anchor” box (from the above right picture) individually and store that in an array. This is a binary array.

I then take these filled in anchor boxs (now separated so each box is in their own image as above) and find the IoU metric by simple multiplication and summation. This approach will calculate the IoU metric regardless of the shape of the objects passed in. However, it is *extremely* inefficient both in memory and computationally. I am using the pytorch library to load the arrays onto my GPU to parallelize the computations.

# Code

This code runs in python-3.x.

```
import numpy as np
import torch
import cv2
from timeit import Timer
def jaccard(box_a, box_b):
denorm_bbox = torch.cat([box_a[:, :4]*768, box_a[:, 4].unsqueeze(1)*90], dim=1)
b_boxes = list(map(lambda x:np.ceil(cv2.boxPoints(((x[1], x[0]), (x[2], x[3]), x[4]))), denorm_bbox.numpy()))
b_imgs = torch.from_numpy(np.array([cv2.fillConvexPoly(np.zeros((768,768), dtype=np.uint8), np.int0(b), 1) for b in b_boxes]).astype(float)).float()
intersection = torch.FloatTensor()
summation = torch.FloatTensor()
for b_img in b_imgs:
intersection = torch.cat([intersection, (b_img*box_b).sum((1,2)).unsqueeze(0)])
summation = torch.cat([summation, (b_img+box_b).sum((1,2)).unsqueeze(0)])
return intersection / (summation - intersection + 1.0)
def main():
anc_grids = [3,6,12]
anc_zooms = [0.7]
anc_ratios = [(1.,1)]
anc_angles = np.array(range(-90, 90, 45))/90
anchor_scales = [(anz*i,anz*j) for anz in anc_zooms for (i,j) in anc_ratios]
k = len(anchor_scales) * len(anc_angles) # number of anchor boxes per anchor point
anc_offsets = [1/(o*2) for o in anc_grids]
anc_x = np.concatenate([np.repeat(np.linspace(ao, 1-ao, ag), ag) for ao,ag in zip(anc_offsets,anc_grids)])
anc_y = np.concatenate([np.tile(np.linspace(ao, 1-ao, ag), ag) for ao,ag in zip(anc_offsets,anc_grids)])
anc_ctrs = np.repeat(np.stack([anc_x,anc_y], axis=1), k, axis=0)
anc_sizes = np.tile(np.concatenate([np.array([[o/ag,p/ag] for i in range(ag*ag) for o,p in anchor_scales])
for ag in anc_grids]), (len(anc_angles),1))
grid_sizes = torch.from_numpy(np.concatenate([np.array([ 1/ag for i in range(ag*ag) for o,p in anchor_scales])
for ag in anc_grids for aa in anc_angles])).unsqueeze(1)
anc_rots = np.tile(np.repeat(anc_angles, len(anchor_scales)), sum(i*i for i in anc_grids))[:,np.newaxis]
anchors = torch.from_numpy(np.concatenate([anc_ctrs, anc_sizes, anc_rots], axis=1)).float()
denorm_anchors = torch.cat([anchors[:, :4]*768, anchors[:, 4].unsqueeze(1)*90], dim=1)
np_anchors = denorm_anchors.numpy()
iou_anchors = list(map(lambda x:np.ceil(cv2.boxPoints(((x[1], x[0]), (x[2], x[3]), x[4]))), np_anchors))
anchor_imgs = torch.from_numpy(np.array([cv2.fillConvexPoly(np.zeros((768,768), dtype=np.uint8), np.int0(a), 1) for a in iou_anchors]).astype(float)).float()
test_tensor = torch.Tensor([[ 0.0807, 0.2844, 0.0174, 0.0117, -0.8440],
[ 0.3276, 0.0358, 0.0169, 0.0212, -0.1257],
[ 0.3040, 0.2904, 0.0101, 0.0157, -0.5000],
[ 0.0065, 0.2109, 0.0130, 0.0078, -1.0000],
[ 0.1895, 0.1556, 0.0143, 0.0091, -1.0000]])
t = Timer(lambda: jaccard(test_tensor, anchor_imgs))
print(f'Consuming {(len(anchors) * np.dtype(float).itemsize * 768 * 768)/1000000000} Gb on {"GPU" if anchors.is_cuda else "RAM"}')
print(f'Averaging {t.timeit(number=100)/100} seconds per function call')
print(jaccard(test_tensor, anchor_imgs))
if __name__ == '__main__':
main()
```

### Sample Run:

```
Consuming 3.567255552 Gb on RAM
Averaging 3.107201199789997 seconds per function call
tensor([[0.0020, 0.0000, 0.0020, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0026, 0.0026, 0.0026, ..., 0.0000, 0.0000, 0.0000]])
```

The results of this run were on a i7-8700K for reference.

# What I would like reviewed:

**Memory Consumption**: Right now I am consuming a lot of memory for all the anchor boxes I compare each box to. What are some ways I can dramatically reduce this without giving up the accuracy of my results?
**Speed**: I need this to run *fast*!

Get this bounty!!!