#StackBounty: #xgboost #training #multilabel-classification Validation error is always zero in a multi class classification problem. Wh…

Bounty: 50

I have a 3 class(1/0/unclassified) classification problem where my training data is created using a bunch of rules.

Problem: Classify whether a person owns a vehicle or travels by public transport.

Dataset: Person’s expense journal entries in csv format (around 2 lakh entries from 20 people for a range of 3 years).

Fields are:

             person_id,date of payment, category, shop,    expense, summary
              1,      2020-01-01    , fuel , fuel_stop,$20,    'paid for refilling'
              2,      2020-01-01    , ticket, `bus`,     $10,    'took a bus to Treasa's house'

Training data generation: No labelling is done here.

Instead some rules are used for tagging the data.

For ex.
Rules for vehicle owners:

  1. Maintenance fee records
  2. Fuel transactions
  3. Few transactions in public transport
  4. Driver salary payments

Rules for non vehicle owners:

  1. Multiple transactions in public transport(bus, train, subway etc.)
  2. No fuel transactions
  3. No maintenance transactions

Nuances like people with vehicles travelling by public transport etc. could be ignored.

I used an XG Boost model for modelling this data.

During cross validation, I can see that the errors are always 0.00, even though logloss is dropping.

[62]    validation_0-merror:0.00000 validation_0-mlogloss:0.12917   validation_1-merror:0.00000 validation_1-mlogloss:0.12983
[63]    validation_0-merror:0.00000 validation_0-mlogloss:0.12524   validation_1-merror:0.00000 validation_1-mlogloss:0.12577
[64]    validation_0-merror:0.00000 validation_0-mlogloss:0.12138   validation_1-merror:0.00000 validation_1-mlogloss:0.12201

The model identifies the vehicle owners in a different test bunch almost correctly, with roughly 96% accuracy.

However, I do not know if the model will be able to identify other cases correctly, or generalise across other features it has not seen.

Could anyone please shed some light on this.

Thanks.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.