# #StackBounty: #machine-learning #neural-networks #meta-learning What is the standard way to define "task" in meta-learning fo…

### Bounty: 50

I want to know what is the most common way to define task in meta-learning.

Chelsea Finn defines it in MAML as follows:

$$T_i = { mathcal L_i(x_1, a_1, …, x_H, a_H), q_i(x_1), q_i(x_{i+1} mid x_i, a_i) }$$

for supervised learning it’s just:

$$T_i = { mathcal L_i(x) , q_i(x) }$$

I usually like denoting $$q_i (x)$$ as $$p(x,y mid i)$$.

If one reads this paragraph and some code on the higher library on an implementation of MAML one can notice that the paper meta-lstm paper and MAML define "task" slightly differently (though they both can be expressed with the same notation using the introduced notation in this question).

MAML defines classification N-way, K-shot learning as follows:

Few Shot Classification:
Classify a new type of image (thus it must be the new task) given only a support set of a few segway’s having been trained on many other types of objects.

So a class is a task in the MAML paper. Looking at this code reinforces my belief of this: https://github.com/facebookresearch/higher/blob/e45c1a059e39a16fa016d37bc15397824c65547c/examples/maml-omniglot.py#L130 and the original MAML paper algorithm 2.

Let me use the letter $$S_t$$ and $$Q_t$$ for the support and query set for 1 class.

When we sample a batch of M tasks then we sample M=N classes and meta-train with this update rule (if using SGD as the outer optimizer):

$$theta^{} := theta^{} – eta_{outer} nabla_{theta^{}}sum^N_{t=1} L( A(theta^{},S_t), Q_t)$$

were $$A$$ is the inner loop adaptation and for MAML we have:

$$theta^{} := theta^{} – eta_{outer} nabla_{theta^{}}sum^N_{t=1} L( theta^{} – eta_{inner} nabla_{theta^{}} L(theta^{}, S_t) , Q_t)$$

But for the meta-lstm paper they sample "data sets" and thus 1 data set is 1 task. It seems according to their algorithm that they only sample 1 task per iteration with this definition of task:

If we let $$S_i, Q_i$$ correspond to the set of images for class $$i$$ then we have $$D^{train}t = {S_i}^N{i=1}$$ and $$D^{test}t = { S_i }^N{i=1}$$ and a data set (a task in meta-lstm) $$D_t = (D^{tr}_t,D^{test}_t)$$. Note $$D^{tr} not = S_t$$ and $$D^{test} not = Q_t$$ using this notation.

Thus the update rule for meta-lstm is very different from MAML (note the 1 indicates that we sampled only 1 task according to MAML so only 1 data set of the N-way, K-shot task):

$$theta^{} := theta^{} – eta_{outer} nabla_{theta^{}} L(A(theta, cup^N_{i=1} D^{tr}_1, D^{test}_1 )$$

$$theta^{} := theta^{} – eta_{outer} nabla_{theta^{}} L(A(theta, cup^N_{i=1} S_i, cup^N_{i=1} Q_i )$$

The reason they are so different according to me is because meta-lstm uses all class in the inner loop while MAML uses 1 single class.

It seems that both are "correct" if one chooses to define task however you want. But they are not equivalent. That is what I am seeking to check.

Are they equivalent and what is the standard way to evaluate a meta-learning algorithm for classification?

During training anything goes, but for evaluation, for comparisons to be meaningful they should be with the same eval method.

I defined task as class and then only updated 1 class per loop and got that my accuracy was 100% every time:

``````task = t = 0
BEFORE sum(base_model.norm) = 1.4409199953079224
AFTER sum(base_model.norm) = 2.2930140495300293
qry_logits_t = tensor([[ 4.3082, -4.6038],
[ 4.4672, -4.3944],
[ 4.5366, -4.6404],
[ 4.2431, -4.3507],
[ 4.4234, -4.3042],
[ 4.6369, -4.5237],
[ 4.4182, -4.6141],
[ 4.1134, -4.0174],
[ 4.4361, -4.3444],
[ 4.9720, -4.9412],
[ 4.6677, -4.7576],
[ 5.0870, -5.1820],
[ 4.9251, -4.9549],
[ 4.5231, -4.4667],
qry_y_t = tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
BEFORE sum(base_model.norm) = 1.4409199953079224
AFTER sum(base_model.norm) = 2.2568979263305664
qry_logits_t = tensor([[-4.2340,  4.2120],
[-4.3894,  4.6440],
[-4.1050,  4.3056],
[-4.3172,  4.5236],
[-4.2520,  4.2614],
[-4.1913,  4.1912],
[-3.8990,  3.9143],
[-4.0798,  4.1540],
[-4.0239,  4.0251],
[-4.1695,  4.3652],
[-4.4205,  4.3489],
[-3.4349,  3.8891],
[-4.1637,  3.9913],
[-4.1643,  4.3026],
qry_y_t = tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
--> episode = 0
meta-loss = 0.00018704216927289963
meta-accs = 1.0