#StackBounty: #deep-learning #cnn #training #computer-vision #pytorch Troubles Training a Faster R-CNN RPN using a Resnet 101 backbone …

Bounty: 100

Training Problems for a RPN

I am trying to train a network for region proposals as in the anchor box-concept
from Faster R-CNN.

I am using a pretrained Resnet 101 backbone with three layers popped off. The popped off
layers are the conv5_x layer, average pooling layer, and softmax layer.

As a result my convolutional feature map fed to the RPN heads for images
of size 600*600 results is of spatial resolution 37 by 37 with 1024 channels.

I have set the gradients of only block conv4_x to be trainable.
From there I am using the torchvision.models.detection rpn code to use the
rpn.AnchorGenerator, rpn.RPNHead, and ultimately rpn.RegionProposalNetwork classes.
There are two losses that are returned by the call to forward, the objectness loss,
and the regression loss.

The issue I am having is that my model is training very, very slowly (as in the loss is improving very slowly). In Girschick’s original paper he says he trains over 80K minibatches (roughly 8 epochs since the Pascal VOC 2012 dataset has about 11000 images), where each mini batch is a single image with 256 anchor boxes, but my network from epoch to epoch improves its loss VERY SLOWLY, and I am training for 30 + epochs.

Below is my class code for the network.

class ResnetRegionProposalNetwork(torch.nn.Module):
    def __init__(self):
        super(ResnetRegionProposalNetwork, self).__init__()
        self.resnet_backbone = torch.nn.Sequential(*list(models.resnet101(pretrained=True).children())[:-3])
        non_trainable_backbone_layers = 5
        counter = 0
        for child in self.resnet_backbone:
            if counter < non_trainable_backbone_layers:
                for param in child.parameters():
                    param.requires_grad = False
                counter += 1
            else:
                break

        anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
        aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
        self.rpn_anchor_generator = rpn.AnchorGenerator(
            anchor_sizes, aspect_ratios
        )
        out_channels = 1024
        self.rpn_head = rpn.RPNHead(
            out_channels, self.rpn_anchor_generator.num_anchors_per_location()[0]
        )

        rpn_pre_nms_top_n = {"training": 2000, "testing": 1000}
        rpn_post_nms_top_n = {"training": 2000, "testing": 1000}
        rpn_nms_thresh = 0.7
        rpn_fg_iou_thresh = 0.7
        rpn_bg_iou_thresh = 0.2
        rpn_batch_size_per_image = 256
        rpn_positive_fraction = 0.5

        self.rpn = rpn.RegionProposalNetwork(
            self.rpn_anchor_generator, self.rpn_head,
            rpn_fg_iou_thresh, rpn_bg_iou_thresh,
            rpn_batch_size_per_image, rpn_positive_fraction,
            rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)

    def forward(self,
                images,       # type: ImageList
                targets=None  # type: Optional[List[Dict[str, Tensor]]]
                ):
        feature_maps = self.resnet_backbone(images)
        features = {"0": feature_maps}
        image_sizes = getImageSizes(images)
        image_list = il.ImageList(images, image_sizes)
        return self.rpn(image_list, features, targets)

I am using the adam optimizer with the following parameters:
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, ResnetRPN.parameters()), lr=0.01, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

My training loop is here:

for epoch_num in range(epochs): # will train epoch number of times per execution of this program
        loss_per_epoch = 0.0
        dl_iterator = iter(P.getPascalVOC2012DataLoader())
        current_epoch = epoch + epoch_num
        saveModelDuringTraining(current_epoch, ResnetRPN, optimizer, running_loss)
        batch_number = 0
        for image_batch, ground_truth_box_batch in dl_iterator:
            #print(batch_number)
            optimizer.zero_grad()
            boxes, losses = ResnetRPN(image_batch, ground_truth_box_batch)
            losses = losses["loss_objectness"] + losses["loss_rpn_box_reg"]
            losses.backward()
            optimizer.step()
            running_loss += float(losses)
            batch_number += 1
            if batch_number % 100 == 0:  # print the loss on every batch of 100 images
                print('[%d, %5d] loss: %.3f' %
                      (current_epoch + 1, batch_number + 1, running_loss))
                string_to_print = "n epoch number:" + str(epoch + 1) + ", batch number:" 
                                  + str(batch_number + 1) + ", running loss: " + str(running_loss)
                printToFile(string_to_print)
                loss_per_epoch += running_loss
                running_loss = 0.0
        print("finished Epoch with epoch loss " + str(loss_per_epoch))
        printToFile("Finished Epoch: " + str(epoch + 1) + " with epoch loss: " + str(loss_per_epoch))
        loss_per_epoch = 0.0

I am considering trying the following ideas to fix the network training very slowly:

  • trying various learning rates (although I have already tried 0.01, 0.001, 0.003 with similar results
  • various batch sizes (so far the best results have been batches of 4 (4 images * 256 anchors per image)
  • freezing more/less layers of the Resnet-101 backbone
  • using a different optimizer altogether
  • different weightings of the loss function

Any hints or things obviously wrong with my approach MUCH APPRECIATED. I would be happy to give any more information to anyone who can help.

Edit: My network is training on a fast GPU, with the images and bounding boxes as torch tensors.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.