#StackBounty: #machine-learning #neural-networks #meta-learning What is the standard way to define "task" in meta-learning fo…

Bounty: 50

I want to know what is the most common way to define task in meta-learning.

Chelsea Finn defines it in MAML as follows:

$$ T_i = { mathcal L_i(x_1, a_1, …, x_H, a_H), q_i(x_1), q_i(x_{i+1} mid x_i, a_i) }$$

for supervised learning it’s just:

$$ T_i = { mathcal L_i(x) , q_i(x) } $$

I usually like denoting $q_i (x) $ as $p(x,y mid i)$.

If one reads this paragraph and some code on the higher library on an implementation of MAML one can notice that the paper meta-lstm paper and MAML define "task" slightly differently (though they both can be expressed with the same notation using the introduced notation in this question).

MAML defines classification N-way, K-shot learning as follows:

Few Shot Classification:
Classify a new type of image (thus it must be the new task) given only a support set of a few segway’s having been trained on many other types of objects.

So a class is a task in the MAML paper. Looking at this code reinforces my belief of this: https://github.com/facebookresearch/higher/blob/e45c1a059e39a16fa016d37bc15397824c65547c/examples/maml-omniglot.py#L130 and the original MAML paper algorithm 2.

Let me use the letter $S_t$ and $Q_t$ for the support and query set for 1 class.

When we sample a batch of M tasks then we sample M=N classes and meta-train with this update rule (if using SGD as the outer optimizer):

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}}sum^N_{t=1} L( A(theta^{<t>},S_t), Q_t) $$

were $A$ is the inner loop adaptation and for MAML we have:

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}}sum^N_{t=1} L( theta^{<t>} – eta_{inner} nabla_{theta^{<t>}} L(theta^{<t>}, S_t) , Q_t)$$

But for the meta-lstm paper they sample "data sets" and thus 1 data set is 1 task. It seems according to their algorithm that they only sample 1 task per iteration with this definition of task:

enter image description here

If we let $S_i, Q_i$ correspond to the set of images for class $i$ then we have $D^{train}t = {S_i}^N{i=1}$ and $D^{test}t = { S_i }^N{i=1} $ and a data set (a task in meta-lstm) $D_t = (D^{tr}_t,D^{test}_t) $. Note $D^{tr} not = S_t$ and $D^{test} not = Q_t$ using this notation.

Thus the update rule for meta-lstm is very different from MAML (note the 1 indicates that we sampled only 1 task according to MAML so only 1 data set of the N-way, K-shot task):

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}} L(A(theta, cup^N_{i=1} D^{tr}_1, D^{test}_1 )$$

$$ theta^{<t+1>} := theta^{<t>} – eta_{outer} nabla_{theta^{<t>}} L(A(theta, cup^N_{i=1} S_i, cup^N_{i=1} Q_i )$$

The reason they are so different according to me is because meta-lstm uses all class in the inner loop while MAML uses 1 single class.

It seems that both are "correct" if one chooses to define task however you want. But they are not equivalent. That is what I am seeking to check.

Are they equivalent and what is the standard way to evaluate a meta-learning algorithm for classification?

During training anything goes, but for evaluation, for comparisons to be meaningful they should be with the same eval method.


I defined task as class and then only updated 1 class per loop and got that my accuracy was 100% every time:

task = t = 0
BEFORE sum(base_model.norm) = 1.4409199953079224
AFTER sum(base_model.norm) = 2.2930140495300293
qry_logits_t = tensor([[ 4.3082, -4.6038],
        [ 4.4672, -4.3944],
        [ 4.5366, -4.6404],
        [ 4.2431, -4.3507],
        [ 4.4234, -4.3042],
        [ 4.6369, -4.5237],
        [ 4.4182, -4.6141],
        [ 4.1134, -4.0174],
        [ 4.4361, -4.3444],
        [ 4.9720, -4.9412],
        [ 4.6677, -4.7576],
        [ 5.0870, -5.1820],
        [ 4.9251, -4.9549],
        [ 4.5231, -4.4667],
        [ 5.1841, -5.0357]], grad_fn=<AddmmBackward>)
qry_y_t = tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
task = t = 1
BEFORE sum(base_model.norm) = 1.4409199953079224
AFTER sum(base_model.norm) = 2.2568979263305664
qry_logits_t = tensor([[-4.2340,  4.2120],
        [-4.3894,  4.6440],
        [-4.1050,  4.3056],
        [-4.3172,  4.5236],
        [-4.2520,  4.2614],
        [-4.1913,  4.1912],
        [-3.8990,  3.9143],
        [-4.0798,  4.1540],
        [-4.0239,  4.0251],
        [-4.1695,  4.3652],
        [-4.4205,  4.3489],
        [-3.4349,  3.8891],
        [-4.1637,  3.9913],
        [-4.1643,  4.3026],
        [-4.3333,  4.1559]], grad_fn=<AddmmBackward>)
qry_y_t = tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
--> episode = 0
meta-loss = 0.00018704216927289963 
meta-accs = 1.0
sum(base_model.grad.norm) = 0.06170148774981499
sum(base_model.norm) = 1.4409199953079224

I am using the pytorch higher library.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.