#StackBounty: #deep-learning #cnn #training #computer-vision #pytorch Troubles Training a Faster R-CNN RPN using a Resnet 101 backbone …

Bounty: 100

Training Problems for a RPN

I am trying to train a network for region proposals as in the anchor box-concept
from Faster R-CNN.

I am using a pretrained Resnet 101 backbone with three layers popped off. The popped off
layers are the conv5_x layer, average pooling layer, and softmax layer.

As a result my convolutional feature map fed to the RPN heads for images
of size 600*600 results is of spatial resolution 37 by 37 with 1024 channels.

I have set the gradients of only block conv4_x to be trainable.
From there I am using the torchvision.models.detection rpn code to use the
rpn.AnchorGenerator, rpn.RPNHead, and ultimately rpn.RegionProposalNetwork classes.
There are two losses that are returned by the call to forward, the objectness loss,
and the regression loss.

The issue I am having is that my model is training very, very slowly (as in the loss is improving very slowly). In Girschick’s original paper he says he trains over 80K minibatches (roughly 8 epochs since the Pascal VOC 2012 dataset has about 11000 images), where each mini batch is a single image with 256 anchor boxes, but my network from epoch to epoch improves its loss VERY SLOWLY, and I am training for 30 + epochs.

Below is my class code for the network.

class ResnetRegionProposalNetwork(torch.nn.Module):
    def __init__(self):
        super(ResnetRegionProposalNetwork, self).__init__()
        self.resnet_backbone = torch.nn.Sequential(*list(models.resnet101(pretrained=True).children())[:-3])
        non_trainable_backbone_layers = 5
        counter = 0
        for child in self.resnet_backbone:
            if counter < non_trainable_backbone_layers:
                for param in child.parameters():
                    param.requires_grad = False
                counter += 1

        anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
        aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
        self.rpn_anchor_generator = rpn.AnchorGenerator(
            anchor_sizes, aspect_ratios
        out_channels = 1024
        self.rpn_head = rpn.RPNHead(
            out_channels, self.rpn_anchor_generator.num_anchors_per_location()[0]

        rpn_pre_nms_top_n = {"training": 2000, "testing": 1000}
        rpn_post_nms_top_n = {"training": 2000, "testing": 1000}
        rpn_nms_thresh = 0.7
        rpn_fg_iou_thresh = 0.7
        rpn_bg_iou_thresh = 0.2
        rpn_batch_size_per_image = 256
        rpn_positive_fraction = 0.5

        self.rpn = rpn.RegionProposalNetwork(
            self.rpn_anchor_generator, self.rpn_head,
            rpn_fg_iou_thresh, rpn_bg_iou_thresh,
            rpn_batch_size_per_image, rpn_positive_fraction,
            rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)

    def forward(self,
                images,       # type: ImageList
                targets=None  # type: Optional[List[Dict[str, Tensor]]]
        feature_maps = self.resnet_backbone(images)
        features = {"0": feature_maps}
        image_sizes = getImageSizes(images)
        image_list = il.ImageList(images, image_sizes)
        return self.rpn(image_list, features, targets)

I am using the adam optimizer with the following parameters:
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, ResnetRPN.parameters()), lr=0.01, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

My training loop is here:

for epoch_num in range(epochs): # will train epoch number of times per execution of this program
        loss_per_epoch = 0.0
        dl_iterator = iter(P.getPascalVOC2012DataLoader())
        current_epoch = epoch + epoch_num
        saveModelDuringTraining(current_epoch, ResnetRPN, optimizer, running_loss)
        batch_number = 0
        for image_batch, ground_truth_box_batch in dl_iterator:
            boxes, losses = ResnetRPN(image_batch, ground_truth_box_batch)
            losses = losses["loss_objectness"] + losses["loss_rpn_box_reg"]
            running_loss += float(losses)
            batch_number += 1
            if batch_number % 100 == 0:  # print the loss on every batch of 100 images
                print('[%d, %5d] loss: %.3f' %
                      (current_epoch + 1, batch_number + 1, running_loss))
                string_to_print = "n epoch number:" + str(epoch + 1) + ", batch number:" 
                                  + str(batch_number + 1) + ", running loss: " + str(running_loss)
                loss_per_epoch += running_loss
                running_loss = 0.0
        print("finished Epoch with epoch loss " + str(loss_per_epoch))
        printToFile("Finished Epoch: " + str(epoch + 1) + " with epoch loss: " + str(loss_per_epoch))
        loss_per_epoch = 0.0

I am considering trying the following ideas to fix the network training very slowly:

  • trying various learning rates (although I have already tried 0.01, 0.001, 0.003 with similar results
  • various batch sizes (so far the best results have been batches of 4 (4 images * 256 anchors per image)
  • freezing more/less layers of the Resnet-101 backbone
  • using a different optimizer altogether
  • different weightings of the loss function

Any hints or things obviously wrong with my approach MUCH APPRECIATED. I would be happy to give any more information to anyone who can help.

Edit: My network is training on a fast GPU, with the images and bounding boxes as torch tensors.

Get this bounty!!!

#StackBounty: #xgboost #training #multilabel-classification Validation error is always zero in a multi class classification problem. Wh…

Bounty: 50

I have a 3 class(1/0/unclassified) classification problem where my training data is created using a bunch of rules.

Problem: Classify whether a person owns a vehicle or travels by public transport.

Dataset: Person’s expense journal entries in csv format (around 2 lakh entries from 20 people for a range of 3 years).

Fields are:

             person_id,date of payment, category, shop,    expense, summary
              1,      2020-01-01    , fuel , fuel_stop,$20,    'paid for refilling'
              2,      2020-01-01    , ticket, `bus`,     $10,    'took a bus to Treasa's house'

Training data generation: No labelling is done here.

Instead some rules are used for tagging the data.

For ex.
Rules for vehicle owners:

  1. Maintenance fee records
  2. Fuel transactions
  3. Few transactions in public transport
  4. Driver salary payments

Rules for non vehicle owners:

  1. Multiple transactions in public transport(bus, train, subway etc.)
  2. No fuel transactions
  3. No maintenance transactions

Nuances like people with vehicles travelling by public transport etc. could be ignored.

I used an XG Boost model for modelling this data.

During cross validation, I can see that the errors are always 0.00, even though logloss is dropping.

[62]    validation_0-merror:0.00000 validation_0-mlogloss:0.12917   validation_1-merror:0.00000 validation_1-mlogloss:0.12983
[63]    validation_0-merror:0.00000 validation_0-mlogloss:0.12524   validation_1-merror:0.00000 validation_1-mlogloss:0.12577
[64]    validation_0-merror:0.00000 validation_0-mlogloss:0.12138   validation_1-merror:0.00000 validation_1-mlogloss:0.12201

The model identifies the vehicle owners in a different test bunch almost correctly, with roughly 96% accuracy.

However, I do not know if the model will be able to identify other cases correctly, or generalise across other features it has not seen.

Could anyone please shed some light on this.


Get this bounty!!!

#StackBounty: #machine-learning #classification #random-forest #reinforcement-learning #training Can a classifier be trained with reinf…

Bounty: 50

Question: Can a classifier be trained with reinforcement learning without access to single classification results?

I want to train a classifier (e.g. Random Forest) using reinforcement learning. However, there is one big restriction: the program does not have access to the score regularly, not even after every classification. Only after many classifications were completed (e.g. around 40-200 classifications, let’s call them a batch) the final score is available. One batch can be executed rather quickly: it takes just around one second. Therefore, thousands of batches can be executed, each of them returning a score for its classifications. Every time a batch is executed, the current ML Model (e.g. Random Forest) is given as input for the batch to use.

Other than that, of course, the feature vector is known (contains around 60 features) and the labels are known (around 6 labels).

I have never applied Reinforcement Learning before, therefore, I can not tell whether this can work. In theory, I think, it should: all data is available. The algorithm can choose some parameter values for the model (could be a Random Forest), try them out, and get a score. Then try out different values and get the score again. This way it should be able to improve step by step.

Additional Notes: Although the text above should be enough to understand the problem and provide an answer (which can be general and not specific to a concrete use case), my personal use case and details about it are explained here. This might be useful to understand the problem in more detail.

Get this bounty!!!

#StackBounty: #keras #cnn #training #inception #colab Very Fast Training After First Epoch

Bounty: 50

I trained an InceptionV3 model using plant images. I used Keras library. When training was started, first epoch took 29s per step and then other steps took approximately 530ms per step. So that made me doubt whether there is a bug in my code. I checked my code several times, but its logic seems right to me. I trained my model on Google Colab. I wonder whether there is a memoization mechanism or my code contains bugs. Here my code:

# Yields one image-target pair when called
def image_target_generator(files, labels):
assert len(files) == len(labels), 'Files and labels sizes don't match!'

for step in range(len(files)):
    img = cv2.imread(dataset_path + files[step])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    item = (img, labels[step])
    yield item

# Generating batch
 def batch_generator(gen):

    batch_images = []
    batch_targets = []

    for item in gen:

      if len(batch_images) == BATCH_SIZE:
        yield batch_images, batch_targets
        batch_images = []
        batch_targets = []

      preprocessed_img = preprocess_image(item[0])

    yield batch_images, batch_targets

# Training generator
def training_generator(files, labels):

  # So that Keras can loop it as long as required
  while True:

    for batch in batch_generator(image_target_generator(files, labels)):
      batch_images = np.stack(batch[0], axis=0)
      batch_targets = keras.utils.np_utils.to_categorical(batch[1], NUM_CLASSES)
      yield batch_images, batch_targets

# Create model
def create_model():
  model = keras.applications.InceptionV3(include_top=False, input_shape= IMG_SIZE, IMG_SIZE, 3), weights='imagenet')

  new_output = keras.layers.GlobalAveragePooling2D()(model.output)
  new_output = keras.layers.Dense(NUM_CLASSES, activation='softmax') (new_output)
  model = keras.engine.training.Model(model.inputs, new_output)

  for layer in model.layers:
    layer.Trainable = True

    if isinstance(layer, keras.layers.BatchNormalization):
      layer.momentum = 0.9

  for layer in model.layers[:-50]:
    if not isinstance(layer, keras.layers.BatchNormalization):
      layer.trainable = False

  return model

# Compiling model
model = create_model()

model.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.adamax(lr=1e-2), metrics=['accuracy'])

# Fitting model
  training_generator(train_x, train_y),
  steps_per_epoch=len(train_x) // BATCH_SIZE,
  epochs = 30,
  validation_data=training_generator(test_x, test_y),
  validation_steps=len(test_x) // BATCH_SIZE

Get this bounty!!!

#StackBounty: #neural-network #keras #tensorflow #training #ensemble-modeling Training an ensemble of small neural networks efficiently…

Bounty: 50

I have a bunch of small neural networks (say, 5 to 50 feed-forward neural networks with only two hidden layers with 10-100 neurons each), which differ only in the weight initialization. I want to train them all on the same, smallish dataset (say, 10K rows), with a batch size of 1. The aim of this is to combine them into an ensemble by averaging the results.

Now, of course I can build the whole ensemble as one neural network in TensorFlow/Keras, like this:

def bagging_ensemble(inputs: int, width: int, weak_learners: int):
    r'''Return a generic dense network model

    inputs: number of columns (features) in the input data set
    width: number of neurons in the hidden layer of each weak learner
    weak_learners: number of weak learners in the ensemble
    assert width >= 1, 'width is required to be at least 1'
    assert weak_learners >= 1, 'weak_learners is required to be at least 1'

    activation = tf.keras.activations.tanh
    kernel_initializer = tf.initializers.GlorotUniform()

    input_layer = tf.keras.Input(shape=(inputs,))
    layers = input_layer
    hidden = tf.keras.layers.Dense(units=width, activation=activation, kernel_initializer=kernel_initializer)
    hidden = []
    # add hidden layer as a list of weak learners
    for i in range(weak_learners):
        weak_learner = tf.keras.layers.Dense(units=width, activation=activation, kernel_initializer=kernel_initializer)
        weak_learner = tf.keras.layers.Dense(units=1, activation=tf.keras.activations.sigmoid)(weak_learner)

    output_layer = tf.keras.layers.Average()(hidden)  # add an averaging layer at the end

    return tf.keras.Model(input_layer, output_layer)      

example_model = bagging_ensemble(inputs=30, width=10, weak_learners=5)

The resulting model’s plot looks like this:
Plot of Keras model

However, training the model is slow, and because of the batch size of 1, a GPU doesn’t really speed up the process. How can I make better use of the GPU when training such a model in TensorFlow 2, without using a larger batch size?

[The motivation for using this kind of ensemble is the following: Especially with small datasets, each of the small networks will yield different results because of different random initializations. By bagging as demonstrated here, the quality of the resulting model is greatly enhanced. If you’re interested in the thorough neural network modelling approach this technique comes from, look for the works of H. G. Zimmermann.]

Get this bounty!!!