*Bounty: 50*

*Bounty: 50*

## Backrounds

I would like to build a model that predicts a **month** label $mathbf{y}$ from a given set of features $mathbf{X}$. Data structure is as follows.

- $mathbf{X} : N_{samples} times N_{features}$.
- $mathbf{y}: N_{samples} times 1$, which has range of $1,2,cdots,12$.

I may find it more helpful to have output as **predicted probability** of each labels, since I would like to make use of the prediction uncertainty. I may try any multi-class algorithms to build such model. Actually, I tried some of scikit-learn’s multiclass algorithms.

However, I found out that none of them very useful, due to the following problem that I face.

## Problem : I cannot make use of class similarity

By **class similarity**, I mean the similar characteristics that temporally adjacent months generally share. Most algorithms do not provide any ways to make use of such **prior** knowledge. In other words, they miss the following requirements:

It is quite okay to predict January(1) for February(2), but very undesirable to predict August(8) for February(2)

For instance, I may try **multi-layer perceptron** classifier(MLP) to build a model. However, algorithms such as MLP are optimized for problems such as classification of hand-written digits. In these problems, predicting 1 for 2 is **equally** undesirable to predicting 8 for 2.

In other words, most algorithms are **agnostic** to the relative similarity across labels. If a classifier could exploit such class similarity as a prior knowledge, it may perform much better. If I were to force such prior in the form of distribution, I may choose **cosine-shaped** distribution across months.

Some may suggest some algorithms that are based on linear regression, such as all-or-rest **logistic regression**. However, since **months** have **wrap-around** properties, such regression models may not work well. For instance, assuming $mathbf{y}$ as continuous variable may miss that January(1) and December(12) are actually very similar.

## Questions

As a beginner to machine learning, I am not very familiar with available algorithms. Any help, including ideas about my problem or recommendations of related papers, threads, or websites, will be welcomed.