*Bounty: 50*

*Bounty: 50*

I want to know what is the most common way to define task in meta-learning.

Chelsea Finn defines it in MAML as follows:

$$ T_i = { mathcal L_i(x_1, a_1, …, x_H, a_H), q_i(x_1), q_i(x_{i+1} mid x_i, a_i) }$$

for supervised learning it’s just:

$$ T_i = { mathcal L_i(x) , q_i(x) } $$

I usually like denoting $q_i (x) $ as $p(x,y mid i)$.

If one reads this paragraph and some code on the higher library on an implementation of MAML one can notice that the paper meta-lstm paper and MAML define "task" slightly differently (though they both can be expressed with the same notation using the introduced notation in this question).

MAML defines classification N-way, K-shot learning as follows:

Few Shot Classification:

Classify a new type of image (thus it must be the new task) given only a support set of a few segway’s having been trained on many other types of objects.

**So a class is a task in the MAML paper**. Looking at this code reinforces my belief of this: https://github.com/facebookresearch/higher/blob/e45c1a059e39a16fa016d37bc15397824c65547c/examples/maml-omniglot.py#L130 and the original MAML paper algorithm 2.

Let me use the letter $S_t$ and $Q_t$ for the support and query set for 1 class.

When we sample a batch of M tasks then we sample M=N classes and meta-train with this update rule (if using SGD as the outer optimizer):

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}}sum^N_{t=1} L( A(theta^{<t>},S_t), Q_t) $$

were $A$ is the inner loop adaptation and for MAML we have:

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}}sum^N_{t=1} L( theta^{<t>} – eta_{inner} nabla_{theta^{<t>}} L(theta^{<t>}, S_t) , Q_t)$$

**But for the meta-lstm paper they sample "data sets" and thus 1 data set is 1 task.** It seems according to their algorithm that they only sample 1 task per iteration with this definition of task:

If we let $S_i, Q_i$ correspond to the set of images for class $i$ then we have $D^{train}*t = {S_i}^N*{i=1}$ and $D^{test}*t = { S_i }^N*{i=1} $ and a data set (a task in meta-lstm) $D_t = (D^{tr}_t,D^{test}_t) $. Note $D^{tr} not = S_t$ and $D^{test} not = Q_t$ using this notation.

Thus the update rule for meta-lstm is very different from MAML (note the 1 indicates that we sampled only 1 task according to MAML so only 1 data set of the N-way, K-shot task):

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}} L(A(theta, cup^N_{i=1} D^{tr}_1, D^{test}_1 )$$

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}} L(A(theta, cup^N_{i=1} S_i, cup^N_{i=1} Q_i )$$

The reason they are so different according to me is because meta-lstm uses **all class in the inner loop** while MAML uses 1 single class.

It seems that both are "correct" if one chooses to define task however you want. But they are not equivalent. That is what I am seeking to check.

**Are they equivalent and what is the standard way to evaluate a meta-learning algorithm for classification?**

During training anything goes, but for evaluation, for comparisons to be meaningful they should be with the same eval method.

I defined task as class and then only updated 1 class per loop and got that my accuracy was 100% every time:

```
task = t = 0
BEFORE sum(base_model.norm) = 1.4409199953079224
AFTER sum(base_model.norm) = 2.2930140495300293
qry_logits_t = tensor([[ 4.3082, -4.6038],
[ 4.4672, -4.3944],
[ 4.5366, -4.6404],
[ 4.2431, -4.3507],
[ 4.4234, -4.3042],
[ 4.6369, -4.5237],
[ 4.4182, -4.6141],
[ 4.1134, -4.0174],
[ 4.4361, -4.3444],
[ 4.9720, -4.9412],
[ 4.6677, -4.7576],
[ 5.0870, -5.1820],
[ 4.9251, -4.9549],
[ 4.5231, -4.4667],
[ 5.1841, -5.0357]], grad_fn=<AddmmBackward>)
qry_y_t = tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
task = t = 1
BEFORE sum(base_model.norm) = 1.4409199953079224
AFTER sum(base_model.norm) = 2.2568979263305664
qry_logits_t = tensor([[-4.2340, 4.2120],
[-4.3894, 4.6440],
[-4.1050, 4.3056],
[-4.3172, 4.5236],
[-4.2520, 4.2614],
[-4.1913, 4.1912],
[-3.8990, 3.9143],
[-4.0798, 4.1540],
[-4.0239, 4.0251],
[-4.1695, 4.3652],
[-4.4205, 4.3489],
[-3.4349, 3.8891],
[-4.1637, 3.9913],
[-4.1643, 4.3026],
[-4.3333, 4.1559]], grad_fn=<AddmmBackward>)
qry_y_t = tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
--> episode = 0
meta-loss = 0.00018704216927289963
meta-accs = 1.0
sum(base_model.grad.norm) = 0.06170148774981499
sum(base_model.norm) = 1.4409199953079224
```

I am using the pytorch higher library.