*Bounty: 50*

*Bounty: 50*

I am looking for an existing loss function (or regularizer) to penalize a loss that is based on the wrong order of outputs. Let me visualize this using the residual sum of squares (with a $j$-dimensional output $O$):

$

begin{align}

text{RSS} = sum_{i=i}^{N} sum_{j=1}^{O} (hat{y_i}^{(j)} – y_i^{(j)})^2

end{align}

$,

where $hat{y_i}$ is the prediction for datum $i$ and $y_i$ the corresponding ground truth. Now assume a problem where I am interested in two outcomes at the same time, say, $o_1,o_2$. These should be binary so that I want the underlying model to learn to predict either $0$ or $1$ for each. All possible (and optimal) outcomes are thus $big[ {0,0}, {0,1}, {1,0}, {1,1} big]$.

However, the first of these outcomes, $o_1$, is **more important** in a sense that the loss should penalize a prediction of ${1,0}$ **more** than ${0,1}$, if the true output were ${0,0}$. In general, any $o_j$ is more important than $o_{j+1}$. If I were to use $text{RSS}$, the penalty would be the same for either outcome.

I have been thinking about introducing an additional weight for each output and to include it in the RSS, such as

$

begin{align}

text{RSS} = sum_{i=1}^{N} sum_{j=i}^{O} (w_j * (hat{y_i}^{(j)} – y_i^{(j)}))^2,

\

\

text{and} ; forall boldsymbol w > 1 ; land ; w_k > w_{k+1}.

end{align}

$

However, I would rather not roll my own solution. So I was curious, wheter something like this exists?