#StackBounty: #deep-learning #nlp #keras Organization of layers in Keras for a NLP problem

Bounty: 50

I have been trying out an NLP problem where I have to predict multi-label-sentiments for some text.

I have 8 labels and 170k training examples, and 140k for the test set.
My final dictionary size is around 190k.

I am using Keras for trying out an NN approach, although I am not sure if my architecture is right, Below is the model which gives me a 95 % on the test set, I am testing out accuracies on 0.7 – 0.3 split while training:

model = Sequential
model.add(Embedding(max_indexes + 1, 100, weights=[embeddings], 
                  input_length=100))

model.add(Bidirectional(LSTM(256, return_sequences=True)))

model.add(Convolution1D(256, 5, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())

model.add(Convolution1D(256, 5, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())

model.add(Convolution1D(256, 5, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPool1D())

model.add(Convolution1D(256, 4, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())

model.add(Convolution1D(256, 3, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())

model.add(Convolution1D(256, 3, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPool1D())

model.add(Flatten())

model.add(Dense(64, activation='relu',
                kernel_regularizer=keras.regularizers.l2(0.02)))

model.add(Dense(target_classes_len, activation='sigmoid',
                kernel_regularizer=keras.regularizers.l2(0.02)))
model.add(Dropout(0.1))

adam_opt = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, mode='min')

save_best = ModelCheckpoint('model_x.hdf', save_best_only=True, 
                               monitor='val_loss', mode='min')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                              patience=5, min_lr=0.0001)

model.compile(adam_opt, 'binary_crossentropy', metrics=['accuracy'])

Over the internet, and while watching a video of CSE231N and other StackOverflow questions, I found out I should follow:

[(CONV-RELU)*n - POOL? ]*m - (FC-RELU)*K, Softmax
 N~5, M~very large, K>=0 and K<=2

Although it was for ImageNet classification problem and I have an NLP problem at hand.

Also, I found out that should place DropOut after final FullyConnected and BatchNormalization in a sequence with CONV-RELU.

I justed wanted to know what would a good organization of layers for an NLP problem, or any way of evaluating different architectures?

Thanks


Get this bounty!!!

Leave a Reply