#StackBounty: #keras #tensorflow #cnn #image-segmentation #reshape Modifying U-Net implementation for smaller image size

Bounty: 50

I’m implementing the U-Net model per the published paper here. This is my model so far:

def create_unet_model(image_size = IMAGE_SIZE):

    # Input layer is a 572,572 colour image
    input_layer = Input(shape=(image_size) + (3,))

    """ Begin Downsampling """

    # Block 1
    conv_1 = Conv2D(64, 3, activation = 'relu')(input_layer)
    conv_2 = Conv2D(64, 3, activation = 'relu')(conv_1)

    max_pool_1 = MaxPool2D(strides=2)(conv_2)

    # Block 2
    conv_3 = Conv2D(128, 3, activation = 'relu')(max_pool_1)
    conv_4 = Conv2D(128, 3, activation = 'relu')(conv_3)

    max_pool_2 = MaxPool2D(strides=2)(conv_4)

    # Block 3
    conv_5 = Conv2D(256, 3, activation = 'relu')(max_pool_2)
    conv_6 = Conv2D(256, 3, activation = 'relu')(conv_5)

    max_pool_3 = MaxPool2D(strides=2)(conv_6)

    # Block 4
    conv_7 = Conv2D(512, 3, activation = 'relu')(max_pool_3)
    conv_8 = Conv2D(512, 3, activation = 'relu')(conv_7)

    max_pool_4 = MaxPool2D(strides=2)(conv_8)

    """ Begin Upsampling """

    # Block 5
    conv_9 = Conv2D(1024, 3, activation = 'relu')(max_pool_4)
    conv_10 = Conv2D(1024, 3, activation = 'relu')(conv_9)

    upsample_1 = UpSampling2D()(conv_10)

    # Copy and Crop
    conv_8_cropped = Cropping2D(cropping=4)(conv_8)
    merge_1 = Concatenate()([conv_8_cropped, upsample_1])

    # Block 6
    conv_11 = Conv2D(512, 3, activation = 'relu')(merge_1)
    conv_12 = Conv2D(512, 3, activation = 'relu')(conv_11)

    upsample_2 = UpSampling2D()(conv_12)

    # Copy and Crop
    conv_6_cropped = Cropping2D(cropping=16)(conv_6)
    merge_2 = Concatenate()([conv_6_cropped, upsample_2])

    # Block 7
    conv_13 = Conv2D(256, 3, activation = 'relu')(merge_2)
    conv_14 = Conv2D(256, 3, activation = 'relu')(conv_13)
    upsample_3 = UpSampling2D()(conv_14)

    # Copy and Crop
    conv_4_cropped = Cropping2D(cropping=40)(conv_4)
    merge_3 = Concatenate()([conv_4_cropped, upsample_3])

    # Block 8
    conv_15 = Conv2D(128, 3, activation = 'relu')(merge_3)
    conv_16 = Conv2D(128, 3, activation = 'relu')(conv_15)
    upsample_4 = UpSampling2D()(conv_16)

    # Connect layers
    conv_2_cropped = Cropping2D(cropping=88)(conv_2)
    merge_4 = Concatenate()([conv_2_cropped, upsample_4])

    # Block 9
    conv_17 = Conv2D(64, 3, activation = 'relu')(merge_4)
    conv_18 = Conv2D(64, 3, activation = 'relu')(conv_17)

    # Output layer
    output_layer = Conv2D(1, 1, activation='sigmoid')(conv_18)

    """ Define the model """
    unet = Model(input_layer, output_layer)
    
    return unet

The cropping implemented as specified in this answer and is specific to 572×572 images.

Unfortunately this implementation causes a ResourceExhaustedError:

Exception has occurred: ResourceExhaustedError
 OOM when allocating tensor with shape[32,64,392,392] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node model/cropping2d_3/strided_slice (defined at c:main.py:74) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_train_function_3026]

Function call stack:
train_function
  File "C:main.py", line 74, in main
    unet_model.fit(train_images, epochs=epochs, validation_data=validation_images, callbacks=CALLBACKS)
  File "C:main.py", line 276, in <module>
    main()

My GPU is a GeForce RTX 2070 Super 8GB.

I verified that the image size was the source of this by reproducing the error in another u-net solution which I know works.

To workaround this issue, I’m trying to lower the image sizes e.g. 256×256. I’ve changed the Cropping2D layers to crop to the expected sizes for each layer:

# Copy and Crop - 24 -> 16
conv_8_cropped = Cropping2D(cropping=4)(conv_8)
merge_1 = Concatenate()([conv_8_cropped, upsample_1])

# Copy and Crop - 57 -> 24
conv_6_cropped = Cropping2D(cropping=((17,16),(17,16)))(conv_6)
merge_2 = Concatenate()([conv_6_cropped, upsample_2])

# Copy and Crop - 122 -> 40
conv_4_cropped = Cropping2D(cropping=41)(conv_4)
merge_3 = Concatenate()([conv_4_cropped, upsample_3])

# Copy and Crop - 252 -> 72
conv_2_cropped = Cropping2D(cropping=90)(conv_2)
merge_4 = Concatenate()([conv_2_cropped, upsample_4])

Updated model summary:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 254, 254, 64) 1792        input_1[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 252, 252, 64) 36928       conv2d[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 126, 126, 64) 0           conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 124, 124, 128 73856       max_pooling2d[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 122, 122, 128 147584      conv2d_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 61, 61, 128)  0           conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 59, 59, 256)  295168      max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 57, 57, 256)  590080      conv2d_4[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 28, 28, 256)  0           conv2d_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 26, 26, 512)  1180160     max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 24, 24, 512)  2359808     conv2d_6[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 12, 12, 512)  0           conv2d_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 10, 10, 1024) 4719616     max_pooling2d_3[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 8, 8, 1024)   9438208     conv2d_8[0][0]
__________________________________________________________________________________________________
cropping2d (Cropping2D)         (None, 16, 16, 512)  0           conv2d_7[0][0]
__________________________________________________________________________________________________
up_sampling2d (UpSampling2D)    (None, 16, 16, 1024) 0           conv2d_9[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 16, 16, 1536) 0           cropping2d[0][0]
                                                                 up_sampling2d[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 14, 14, 512)  7078400     concatenate[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 12, 12, 512)  2359808     conv2d_10[0][0]
__________________________________________________________________________________________________
cropping2d_1 (Cropping2D)       (None, 24, 24, 256)  0           conv2d_5[0][0]
__________________________________________________________________________________________________
up_sampling2d_1 (UpSampling2D)  (None, 24, 24, 512)  0           conv2d_11[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 24, 24, 768)  0           cropping2d_1[0][0]
                                                                 up_sampling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 22, 22, 256)  1769728     concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 20, 20, 256)  590080      conv2d_12[0][0]
__________________________________________________________________________________________________
cropping2d_2 (Cropping2D)       (None, 40, 40, 128)  0           conv2d_3[0][0]
__________________________________________________________________________________________________
up_sampling2d_2 (UpSampling2D)  (None, 40, 40, 256)  0           conv2d_13[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 40, 40, 384)  0           cropping2d_2[0][0]
                                                                 up_sampling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 38, 38, 128)  442496      concatenate_2[0][0]
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 36, 36, 128)  147584      conv2d_14[0][0]
__________________________________________________________________________________________________
cropping2d_3 (Cropping2D)       (None, 72, 72, 64)   0           conv2d_1[0][0]
__________________________________________________________________________________________________
up_sampling2d_3 (UpSampling2D)  (None, 72, 72, 128)  0           conv2d_15[0][0]
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 72, 72, 192)  0           cropping2d_3[0][0]
                                                                 up_sampling2d_3[0][0]
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 70, 70, 64)   110656      concatenate_3[0][0]
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 68, 68, 64)   36928       conv2d_16[0][0]
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 68, 68, 1)    65          conv2d_17[0][0]
==================================================================================================
Total params: 31,378,945
Trainable params: 31,378,945
Non-trainable params: 0

This compiles fine but fails at train time with:

Exception has occurred: InvalidArgumentError
 Incompatible shapes: [32,68,68] vs. [32,256,256]
     [[node Equal (defined at c:main.py:74) ]] [Op:__inference_train_function_3026]

Function call stack:
train_function

Does anyone know why the shapes are so incorrect at runtime and how I can fix them?


Get this bounty!!!

#StackBounty: #deep-learning #cnn #training #computer-vision #pytorch Troubles Training a Faster R-CNN RPN using a Resnet 101 backbone …

Bounty: 100

Training Problems for a RPN

I am trying to train a network for region proposals as in the anchor box-concept
from Faster R-CNN.

I am using a pretrained Resnet 101 backbone with three layers popped off. The popped off
layers are the conv5_x layer, average pooling layer, and softmax layer.

As a result my convolutional feature map fed to the RPN heads for images
of size 600*600 results is of spatial resolution 37 by 37 with 1024 channels.

I have set the gradients of only block conv4_x to be trainable.
From there I am using the torchvision.models.detection rpn code to use the
rpn.AnchorGenerator, rpn.RPNHead, and ultimately rpn.RegionProposalNetwork classes.
There are two losses that are returned by the call to forward, the objectness loss,
and the regression loss.

The issue I am having is that my model is training very, very slowly (as in the loss is improving very slowly). In Girschick’s original paper he says he trains over 80K minibatches (roughly 8 epochs since the Pascal VOC 2012 dataset has about 11000 images), where each mini batch is a single image with 256 anchor boxes, but my network from epoch to epoch improves its loss VERY SLOWLY, and I am training for 30 + epochs.

Below is my class code for the network.

class ResnetRegionProposalNetwork(torch.nn.Module):
    def __init__(self):
        super(ResnetRegionProposalNetwork, self).__init__()
        self.resnet_backbone = torch.nn.Sequential(*list(models.resnet101(pretrained=True).children())[:-3])
        non_trainable_backbone_layers = 5
        counter = 0
        for child in self.resnet_backbone:
            if counter < non_trainable_backbone_layers:
                for param in child.parameters():
                    param.requires_grad = False
                counter += 1
            else:
                break

        anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
        aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
        self.rpn_anchor_generator = rpn.AnchorGenerator(
            anchor_sizes, aspect_ratios
        )
        out_channels = 1024
        self.rpn_head = rpn.RPNHead(
            out_channels, self.rpn_anchor_generator.num_anchors_per_location()[0]
        )

        rpn_pre_nms_top_n = {"training": 2000, "testing": 1000}
        rpn_post_nms_top_n = {"training": 2000, "testing": 1000}
        rpn_nms_thresh = 0.7
        rpn_fg_iou_thresh = 0.7
        rpn_bg_iou_thresh = 0.2
        rpn_batch_size_per_image = 256
        rpn_positive_fraction = 0.5

        self.rpn = rpn.RegionProposalNetwork(
            self.rpn_anchor_generator, self.rpn_head,
            rpn_fg_iou_thresh, rpn_bg_iou_thresh,
            rpn_batch_size_per_image, rpn_positive_fraction,
            rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)

    def forward(self,
                images,       # type: ImageList
                targets=None  # type: Optional[List[Dict[str, Tensor]]]
                ):
        feature_maps = self.resnet_backbone(images)
        features = {"0": feature_maps}
        image_sizes = getImageSizes(images)
        image_list = il.ImageList(images, image_sizes)
        return self.rpn(image_list, features, targets)

I am using the adam optimizer with the following parameters:
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, ResnetRPN.parameters()), lr=0.01, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

My training loop is here:

for epoch_num in range(epochs): # will train epoch number of times per execution of this program
        loss_per_epoch = 0.0
        dl_iterator = iter(P.getPascalVOC2012DataLoader())
        current_epoch = epoch + epoch_num
        saveModelDuringTraining(current_epoch, ResnetRPN, optimizer, running_loss)
        batch_number = 0
        for image_batch, ground_truth_box_batch in dl_iterator:
            #print(batch_number)
            optimizer.zero_grad()
            boxes, losses = ResnetRPN(image_batch, ground_truth_box_batch)
            losses = losses["loss_objectness"] + losses["loss_rpn_box_reg"]
            losses.backward()
            optimizer.step()
            running_loss += float(losses)
            batch_number += 1
            if batch_number % 100 == 0:  # print the loss on every batch of 100 images
                print('[%d, %5d] loss: %.3f' %
                      (current_epoch + 1, batch_number + 1, running_loss))
                string_to_print = "n epoch number:" + str(epoch + 1) + ", batch number:" 
                                  + str(batch_number + 1) + ", running loss: " + str(running_loss)
                printToFile(string_to_print)
                loss_per_epoch += running_loss
                running_loss = 0.0
        print("finished Epoch with epoch loss " + str(loss_per_epoch))
        printToFile("Finished Epoch: " + str(epoch + 1) + " with epoch loss: " + str(loss_per_epoch))
        loss_per_epoch = 0.0

I am considering trying the following ideas to fix the network training very slowly:

  • trying various learning rates (although I have already tried 0.01, 0.001, 0.003 with similar results
  • various batch sizes (so far the best results have been batches of 4 (4 images * 256 anchors per image)
  • freezing more/less layers of the Resnet-101 backbone
  • using a different optimizer altogether
  • different weightings of the loss function

Any hints or things obviously wrong with my approach MUCH APPRECIATED. I would be happy to give any more information to anyone who can help.

Edit: My network is training on a fast GPU, with the images and bounding boxes as torch tensors.


Get this bounty!!!

#StackBounty: #python #tensorflow #keras #deep-learning #cnn Text classification CNN overfits training

Bounty: 50

I am trying to use a CNN architecture to classify text sentences. The architecture of the network is as follows:

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
pool1 = MaxPooling1D(pool_size=2)(drop21)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
dense = Dense(16, activation='relu')(pool2)

flat = Flatten()(dense)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

I have some callbacks as early_stopping and reduceLR to stop the training and to reduce the learning rate when the validation loss is not improving (reducing).

early_stopping = EarlyStopping(monitor='val_loss', 
                               patience=5)
model_checkpoint = ModelCheckpoint(filepath=checkpoint_filepath,
                                   save_weights_only=False,
                                   monitor='val_loss',
                                   mode="auto",
                                   save_best_only=True)
learning_rate_decay = ReduceLROnPlateau(monitor='val_loss', 
                                        factor=0.1, 
                                        patience=2, 
                                        verbose=1, 
                                        mode='auto',
                                        min_delta=0.0001, 
                                        cooldown=0,
                                        min_lr=0)

Once the model is trained the history of the training goes as follows:
enter image description here

We can observe here that the validation loss is not improving from epoch 5 on and that the training loss is being overfitted with each step.

I will like to know if I’m doing something wrong in the architecture of the CNN? Aren’t enough the dropout layers to avoid the overfitting? Which are other ways to reduce overfitting?

Any suggestion?

Thanks in advance.


Edit:

I have tried also with regularization an the result where even worse:

kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)

enter image description here


Edit 2:

I have tried to apply BatchNormalization layers after each convolution and the result is the next one:

norm = BatchNormalization()(conv2)

enter image description here


Get this bounty!!!

#StackBounty: #keras #cnn #training #inception #colab Very Fast Training After First Epoch

Bounty: 50

I trained an InceptionV3 model using plant images. I used Keras library. When training was started, first epoch took 29s per step and then other steps took approximately 530ms per step. So that made me doubt whether there is a bug in my code. I checked my code several times, but its logic seems right to me. I trained my model on Google Colab. I wonder whether there is a memoization mechanism or my code contains bugs. Here my code:

# Yields one image-target pair when called
def image_target_generator(files, labels):
assert len(files) == len(labels), 'Files and labels sizes don't match!'

for step in range(len(files)):
    img = cv2.imread(dataset_path + files[step])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    item = (img, labels[step])
    yield item

# Generating batch
 def batch_generator(gen):

    batch_images = []
    batch_targets = []

    for item in gen:

      if len(batch_images) == BATCH_SIZE:
        yield batch_images, batch_targets
        batch_images = []
        batch_targets = []

      preprocessed_img = preprocess_image(item[0])
      batch_images.append(preprocessed_img)
      batch_targets.append(item[1])      

    yield batch_images, batch_targets

# Training generator
def training_generator(files, labels):

  # So that Keras can loop it as long as required
  while True:

    for batch in batch_generator(image_target_generator(files, labels)):
      batch_images = np.stack(batch[0], axis=0)
      batch_targets = keras.utils.np_utils.to_categorical(batch[1], NUM_CLASSES)
      yield batch_images, batch_targets


# Create model
def create_model():
  model = keras.applications.InceptionV3(include_top=False, input_shape= IMG_SIZE, IMG_SIZE, 3), weights='imagenet')

  new_output = keras.layers.GlobalAveragePooling2D()(model.output)
  new_output = keras.layers.Dense(NUM_CLASSES, activation='softmax') (new_output)
  model = keras.engine.training.Model(model.inputs, new_output)

  for layer in model.layers:
    layer.Trainable = True

    if isinstance(layer, keras.layers.BatchNormalization):
      layer.momentum = 0.9

  for layer in model.layers[:-50]:
    if not isinstance(layer, keras.layers.BatchNormalization):
      layer.trainable = False

  return model

# Compiling model
model = create_model()

model.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.adamax(lr=1e-2), metrics=['accuracy'])

# Fitting model
model.fit_generator(
  training_generator(train_x, train_y),
  steps_per_epoch=len(train_x) // BATCH_SIZE,
  epochs = 30,
  validation_data=training_generator(test_x, test_y),
  validation_steps=len(test_x) // BATCH_SIZE
  ) 


Get this bounty!!!

#StackBounty: #deep-learning #cnn Will this MAPE implementation work for multidimensional output?

Bounty: 50

I’m currently working on a CNN problem where the output is a 60×59 array of numerical values. I wanted to verify if the MAPE function I’m employing will properly consider the error by matching each corresponding point to the true value, as opposed to unexpected behaviour. Will this formulation work?

def percent_mean_absolute_error(y_true, y_pred):
    if not K.is_tensor(y_pred):
        y_pred = K.constant(y_pred)
    y_true = K.cast(y_true, y_pred.dtype)
    diff = K.mean(K.abs((y_pred - y_true)) / K.mean(K.clip(K.abs(y_true),
                                                           K.epsilon(),
                                                           None)))
    return 100. * K.mean(diff)


Get this bounty!!!

#StackBounty: #machine-learning #cnn #image-classification #neural problem with distribution of dataset

Bounty: 50

I am doing image classification with CNN and I have a training set and a test set with different distributions. To try to overcome this problem I am thinking about doing a standardization using Imagegenerator, but I am encoutering some problems. Here is the part of the code I am working on:

trainingset = '/content/drive/My Drive/Colab Notebooks/Train'
testset = '/content/drive/My Drive/Colab Notebooks/Test'



batch_size = 32
train_datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rescale = 1. / 255,
    zoom_range=0.1,
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=False)

train_datagen.fit(trainingset);

train_generator = train_datagen.flow_from_directory(
    directory=trainingset,
    #target_size=(256, 256),
    color_mode="rgb",
    batch_size=batch_size,
    class_mode="categorical",
    shuffle=True
)

test_datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rescale = 1. / 255)

test_datagen.fit(testset);

test_generator = test_datagen.flow_from_directory(
    directory=testset,
    #target_size=(256, 256),
    color_mode="rgb",
    batch_size=batch_size,
    class_mode="categorical",
    shuffle=False
)

num_samples = train_generator.n
num_classes = train_generator.num_classes
input_shape = train_generator.image_shape

classnames = [k for k,v in train_generator.class_indices.items()]



print("Image input %s" %str(input_shape))
print("Classes: %r" %classnames)

print('Loaded %d training samples from %d classes.' % 
(num_samples,num_classes))
print('Loaded %d test samples from %d classes.' % 
(test_generator.n,test_generator.num_classes))

so, what I am trying to do is using in the Imag genereator the fields featurewise_center=True and featurewise_std_normalization=True to do standardization, but if I try to fit the generator to the trainingset by doing train_datagen.fit(trainingset); I get the following error:

    ValueError                                Traceback (most recent call last)
<ipython-input-16-28e4ebb819be> in <module>()
     23     vertical_flip=False)
     24 
---> 25 train_datagen.fit(trainingset);
     26 
     27 train_generator = train_datagen.flow_from_directory(

1 frames
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, 
dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

ValueError: could not convert string to float: '/content/drive/My Drive/Colab 
Notebooks/Train'

Can somebody please help me? Thanks in advance.

[EDIT] I am trying to adapt what is written here to my problem.

[EDIT_2] I think the problem is that .fit() takes as parameter a numpy array, while I am trying to pass to it a string, which is the path for the images.

But I don’t understand now how to do, because I should transform this to a numpy array in order to do the fit.


Get this bounty!!!

#StackBounty: #machine-learning #deep-learning #nlp #cnn #rnn Training the document page layout and classifying good/bad layouts

Bounty: 50

I have a use case where I am supposed to get the coordinates of each block element in a page (whether its paragraph, image, table) where I train a model to understand how they are placed in a given page where some documents are with good layout and other with bad ones and I want to train this and throw in some coordinates of a new document and try to understand whether it has a good layout or a bad layout, I want to understand how I can achieve this using some deep learning techniques ?

can someone suggest me an approach for solving this?

Was trying to workout with RNN but nor sure if that’s the correct approach.


Get this bounty!!!

#StackBounty: #cnn Accuracy decrease in production after adding additional input datas

Bounty: 50

I am trying to predict TimeSeriesA by using a CNN. I create snapshot images of the timeseries and these are then labelled.

With a very simple snapshot I get reasonable training and test accuracy. When I apply the model to real world in production I also get reasonable accuracy.

Inorder to improve the accuracy, I added other timeseries to the snapshots that may or may not add value.

Both my training and testing accuracy increased (Training much more so). However, my production accuracy has gone down greatly.

Why might this happen? The original data is still in the snapshot in exactly the same format. Can a CNN be confused (wrong word!) by the additional data?


Get this bounty!!!

#StackBounty: #neural-network #deep-learning #cnn #image-classification How to detect cardboard boxes using Neural Network

Bounty: 50

I’m trying to train a Neural Network how to detect cardboard boxes along with multiple classes of persons (people).

Although it’s easy to detect persons and correctly classifies them, it’s incredibly hard to detect cardboard boxes.

The boxes look like this:

enter image description here

My suspicion is that box is too simple of an object, and the neural network has a hard time detecting it because there are too little features to extract from the object.

The division of the dataset looks like this:

personA: 1160
personB: 1651
personC: 2136
person: 1959
box: 2798

Persons are wearing different safety items, based on the items are classified, while detected as whole person, not just the item.

I tried to use:

ssd300_incetpionv2
ssd512_inceptionv2
faster_rcnn_inceptionv2

All of these are detecting and classifying persons much better than boxes. I cannot provide exact mAP (don’t have it).

Any ideas?

Thanks.


Get this bounty!!!

#StackBounty: #deep-learning #keras #lstm #cnn convLSTM : how to structure input data

Bounty: 50

I have the following dataframe containing training data that I have been using to perform a regression task using CNN + FC :

     fileName  var_t+15m  var_t+30m  var_t+45m  var_t+60m  var_t+90m  var_t+120m  var_t+180m  var_t+240
id                                                                                                                             
2016-10-15 15:00:00  201610151500.jpg     211.00     197.80     170.80      66.90    34.2000   10.120000    0.000867   0.001267
2016-10-15 15:15:00  201610151515.jpg     197.80     170.80      66.90      71.75    20.1600    2.120000    0.001534   0.000534
2016-10-15 15:30:00  201610151530.jpg     170.80      66.90      71.75      34.20    10.1200    0.206200    0.001000   0.001067
2016-10-15 15:45:00  201610151545.jpg      66.90      71.75      34.20      20.16     2.1200    0.012270    0.000400   0.000733
2016-10-15 16:00:00  201610151600.jpg      71.75      34.20      20.16      10.12     0.2062    0.000867    0.001267   0.000934

The task consists in predicting a certain variable at t+X where X goes from 15 minutes up to 240 minutes. So this is a regression task where my training input consists in timestamped picture.

In order to work with these data, I was until now using the .flow_from_dataframe method from Keras in order to perform data augmentation/pre-processing easily and to avoid loading the entire training set consisting of pictures inside the memory.

Up until now I did not leverage the time information and to do so I would like to try the convLSTM model available in Keras. Howevever I am very unfamilar with working with time series.

Has someone used the Keras convLSTM layer combined with the .flow_from_dataframe function ? I am unsure how to structure my data for this setup (convLSTM + .flow_from_dataframe) and I could not find an example on the internet.


Get this bounty!!!