#StackBounty: #machine-learning #classification #binary-data #accuracy Difference between cumulative gains chart and cumulative accurac…

Bounty: 50

I am confused about the following:

Here I find the definition of cumulative gains chart as the plot of

  • x-axis: the rate of predicted positive
  • y-axis: true positive rate

It turns out that we can e.g. order by the probability of a positive outcome and plot the corresponding rate of true positives.

On the other hand there is the cumulative accuracy profile (CAP) explained here where it is said to be constructed as

The CAP of a model represents the cumulative number of positive
outcomes along the y-axis versus the corresponding cumulative number
of a classifying parameter along the x-axis.

Another source for the CAP is “Measuring the Discriminative Power of Rating Systems” by Engelmann, Hayden and Tasche.
Together with the plots I would say that the two concepts are the same. Is this true?


Get this bounty!!!

#StackBounty: #probability #classification #ensemble #feature-weighting How to describe most important features of ensemble model as li…

Bounty: 100

I have created 3 different models and output of them is a class probability in binary classification problem. Models are bit different, showing importance from different features. I have of course one data matrix as a source for this exercise where 70% of data is used as training sample.

How one can summarize importance of different feature values to the final class prob estimate if only data matrix and list of features used is know besides this class probability estimate?

Individual models can be of course explained by different methods, but how one can explain avg ensemble predictions?


Get this bounty!!!

#StackBounty: #classification #neural-networks #dataset #overfitting Higher overfitting using data augmentation with noise?

Bounty: 50

I am training a neural network for Audio classification.

I trained it on the UrbanSound8K dataset, and then I wanted to evaluate how different levels of added noise to the inputs influenced prediction accuracy.

As expected, higher levels of noise resulted in lower accuracy.

Then, I decided to perform data augmentation with noise. So I took the dataset, and I duplicated it with the same files but adding pink noise (+0 dB SNR) to them.

As expected (by me), the overall accuracy increased (a very tinny bit though, 0.5%), and the network became more robust to noise corruptions of the inputs.

However! One thing that I was not expecting was that now the network has reduced its accuracy when predicting only uncorrupted-with-noise inputs (validation inputs). Somehow, it has overfitted to the clean inputs, thus reducing prediction accuracy on these audios.

Is there any explanation or intuition into this result?

I was expecting that the network, having now more and more varied training data, would learn more meaningful features. I guess it is more difficult to overfit to the noisy inputs, but still I don’t understand why it has overfitted to the clean inputs mainly.


Get this bounty!!!

#StackBounty: #regression #machine-learning #classification #references #cart Regression trees with multiple input and output levels

Bounty: 100

I am looking for different modelling approaches which are able to build regression trees (i.e. with continuous input and output variables) with multiple input and output levels.

The most common approach (e.g. with CART) is to recursively do binary splits. I am looking for methods that could split the target variable into multiple branches (according to some sensible metric) within every step.

Any hints, i.e. papers, links etc. are very welcome.

(The reason for me asking this question is that I want to find a good method to extend my OneR package for solving regression problems.)

Edit
To clarify: I am looking for a method which could split the respective input variable into $n$ intervals where each interval leads to an interval in the target variable. I guess you would need some constraints (e.g. the max number of intervals) to get useful results.


Get this bounty!!!

#StackBounty: #classification #model-evaluation What is a Hamming Loss ? will we consider it for an Imbalanced Binary classifier

Bounty: 50

I am trying to understand the evaluation metrics for a classifier model.

What is the necessity for finding out Hamming Loss ?

I have read some documents on the Internet, which basically relates Hamming Loss to a Multi-classifier but still couldn’t really understand why it is really needed to evaluate the model.

Also, is Hamming Loss actually just 1-Accuracy for an Imbalanced Binary Classifier ?

What does it bring to the table that Precision, Recall, F1-Measure couldn’t ?


Get this bounty!!!

#StackBounty: #linux #email #classification An alternative to POPFile email classifier

Bounty: 50

I have been using POPFile for years – but it hasn’t been updated for years.

POPFile is a bayesian email classifier. Normally you’d associate this with spam – is it spam or is it not spam. Bayesian filters are great for this. But POPFile is much more, as it can classify all your email.

So I have a series of folders set up in my IMAP account that I have told POPFile about. POPFile watches what is in those folders, and automatically files similar emails into the same folders as they arrive into INBOX. It tokenises the entire email to achieve this, and almost as a side effect detects and files spam.

I get more than 95% accuracy with POPFile, and it means I don’t have to set up a single rule for automatic filing.

I am concerned that POPFile will stop functioning, as it loses compatibility.

Does anyone know of a similar thing? I have looked at SaneBox, which kind of gets there, but I’d like something that I run myself.

All the searches I do suggest alternatives that are spam filters – I don’t need a spam filter – in fact, my email client does a good job at filtering spam without popfile.

In order of preference I’d says FOSS first, but if paid then so be it. As for how it functions, Linux-based would be ideal so I can have it headless, and if it was a clever sieve type approach, that would be fine, but an IMAP solution would work with any server.


Get this bounty!!!

#StackBounty: #classification #neural-networks #multilabel Theoretical justification for training a multi-class classification model to…

Bounty: 100

Can a multi-class classification model be trained and used for multi-label classification, under any mathematical-theoretical guarantee?

Imagine the following model, actually used in one machine learning library for (text) classification:

  1. A multi-class classifier ― a softmax terminated MLP feeding from word embeddings, but could be anything else as well ― is trained on multi-label data. (i.e. some/most data items have multiple class designations in the training data).
  2. The loss per training item is computed while accounting for only a single target label, selected at random, from among the labels applying to the item in the training data (exact loss function here, excuse the C++). This is just a small speed-inspired variant to standard stochastic gradient descent… which should average out over epochs.
  3. For actual usage, the confidence threshold which maximizes the aggregate Jaccard index over the entire test set, is then used for filtering the labels returned as the network’s (softmax normalized) prediction output.
  4. For each prediction made by the model, only those labels that have confidence larger than the threshold, are kept and considered as the final actionable prediction.

This may feel like a coercion of a multi-class model into a multi-label interpretation. Are there any theoretical guarantees, or counter-guarantees, to this being useful for multilabel semantics? or, how would you reduce a multilabel problem to this?


Get this bounty!!!

#StackBounty: #classification #multilabel Is this a theoretically sound method for multi-label classification?

Bounty: 100

Can a multi-class classification model be trained and used for multi-label classification, under any theoretical guarantee?

Imagine the following model, actually used in one machine learning library for (text) classification:

  1. A multi-class classifier ― a softmax terminated MLP feeding from word embeddings, but could be anything else as well ― is trained on multi-label data. (i.e. some/most data items have multiple class designations in the training data).
  2. The loss per training item is computed while accounting for only a single target label, selected at random, from among the labels applying to the item in the training data (exact loss function here, excuse the C++). This is just a small speed-inspired variant to standard SGD… which should average out over epochs.
  3. For actual usage, the confidence threshold which maximizes the aggregate Jaccard index over the entire test set, is then used for filtering the labels returned as the network’s (softmax normalized) prediction output.
  4. For each prediction made by the model, only those labels that have confidence larger than the threshold, are kept and considered as the final actionable prediction.

This may feel like a coercion of a multi-class model into a multi-label interpretation. Are there any theoretical guarantees, or counter-guarantees, to this performing as intended?

Many thanks!


Get this bounty!!!

#StackBounty: #classification #confusion-matrix Formula for the omission and the commission errors

Bounty: 50

I’m confused by the formula for the commission-error and the omission-error as it was stated a bit differently in a paper I’ve read, compared to the one I’m giving below (maybe the authors changed that because of the context of a change detection, not a casual classification).

Are these the correct formula for a given class?

$$ commisionError = frac{FP}{FP + TP} = frac{FP}{totalPredicted} $$
$$ omissionError = frac{FN}{FN + TP} = frac{FN}{totalReference}$$

Where:

  • FP: The false positive.
  • TP: The true positive.
  • FN: The false negative.
  • TN: The true negative.

If we had only two classes, should these two errors be calculated for the two classes, or is there a way to infer the errors of the second class from those of the first one?

I’m asking because it’s clear that for a two-classes case we have:

$$ FP_{class1} = FN_{class2} $$

$$ FN_{class1} = FP_{class2} $$


Get this bounty!!!

#StackBounty: #classification Which algorithms can be expressed with mostly set intersections

Bounty: 50

I’m sorry in advance if my question is too broad of does not fit here.
Which algorithms in machine learning classification and data mining can be expressed entirely or almost entirely as set intersection operations.
For example, for the case of machine learning classification I do not care about how those sets are computed, I am only interested in the classification algorithm being mostly set intersection operations.

Imagine you have a trained model $M$ for some binary classification task, a threshold $t$, and you are given a sample $s$.
In the simplest case: Are there classifiers which work by outputting 1 iff $vert M cap SomeProcessingFunction(s) vert > t$.
I am also interested in classification algorithms which consist out of multiple set intersection operations.

The rough motivation behind it as follows:
I have a (theoretical) model, where (during the classification) set intersection operations of data sets are cheap, when compared to any other type of instructions.
I am looking for machine learning classification algorithms, which would be particularly efficient in such a model.

thanks in advance!


Get this bounty!!!