#StackBounty: #python #nlp #pytorch #huggingface-transformers #ner Error while trying to fine-tune the ReformerModelWithLMHead (google/…

Bounty: 50

I’m trying to fine-tune the ReformerModelWithLMHead (google/reformer-enwik8) for NER. I used the padding sequence length same as in the encode method (max_length = max([len(string) for string in list_of_strings])) along with attention_masks. And I got this error:

ValueError: If training, make sure that config.axial_pos_shape factors: (128, 512) multiply to sequence length. Got prod((128, 512)) != sequence_length: 2248. You might want to consider padding your sequence length to 65536 or changing config.axial_pos_shape.

  • When I changed the sequence length to 65536, my colab session crashed by getting all the inputs of 65536 lengths.
  • According to the second option(changing config.axial_pos_shape), I cannot change it.

I would like to know, Is there any chance to change config.axial_pos_shape while fine-tuning the model? Or I’m missing something in encoding the input strings for reformer-enwik8?

Thanks!

Question Update: I have tried the following methods:

  1. By giving paramteres at the time of model instantiation:

model = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9, max_position_embeddings=1024, axial_pos_shape=[16,64], axial_pos_embds_dim=[32,96],hidden_size=128)

It gives me the following error:

RuntimeError: Error(s) in loading state_dict for ReformerModelWithLMHead:
size mismatch for reformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([258, 1024]) from checkpoint, the shape in current model is torch.Size([258, 128]).
size mismatch for reformer.embeddings.position_embeddings.weights.0: copying a param with shape torch.Size([128, 1, 256]) from checkpoint, the shape in current model is torch.Size([16, 1, 32]).

This is quite a long error.

  1. Then I tried this code to update the config:

model1 = transformers.ReformerModelWithLMHead.from_pretrained(‘google/reformer-enwik8’, num_labels = 9)

Reshape Axial Position Embeddings layer to match desired max seq length

model1.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model1.reformer.embeddings.position_embeddings.weights[1][0][:128])

Update the config file to match custom max seq length

model1.config.axial_pos_shape = 16,128
model1.config.max_position_embeddings = 16*128 #2048
model1.config.axial_pos_embds_dim= 32,96
model1.config.hidden_size = 128
output_model_path = "model"
model1.save_pretrained(output_model_path)

By this implementation, I am getting this error:

RuntimeError: The expanded size of the tensor (512) must match the existing size (128) at non-singleton dimension 2. Target sizes: [1, 128, 512, 768]. Tensor sizes: [128, 768]

Because updated size/shape doesn’t match with the original config
parameters of pretrained model. The original parameters are:
axial_pos_shape = 128,512 max_position_embeddings = 128*512 #65536
axial_pos_embds_dim= 256,768 hidden_size = 1024

Is it the right way I’m changing the config parameters or do I have to do something else?
Is there any example where ReformerModelWithLMHead(‘google/reformer-enwik8’) model fine-tuned.

My main code implementation is as follow:

class REFORMER(torch.nn.Module):
def __init__(self):
    super(REFORMER, self).__init__()
    self.l1 = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9)

def forward(self, input_ids, attention_masks, labels):
    output_1= self.l1(input_ids, attention_masks, labels = labels)
    return output_1


model = REFORMER()

def train(epoch):
    model.train()
    for _, data in enumerate(training_loader,0):
        ids = data['input_ids'][0]   # input_ids from encode method of the model https://huggingface.co/google/reformer-enwik8#:~:text=import%20torch%0A%0A%23%20Encoding-,def%20encode,-(list_of_strings%2C%20pad_token_id%3D0
        input_shape = ids.size()
        targets = data['tags']
        print("tags: ", targets, targets.size())
        least_common_mult_chunk_length = 65536 
        padding_length = least_common_mult_chunk_length - input_shape[-1] % least_common_mult_chunk_length
        #pad input 
        input_ids, inputs_embeds, attention_mask, position_ids, input_shape = _pad_to_mult_of_chunk_length(self=model.l1,
                input_ids=ids,
                inputs_embeds=None,
                attention_mask=None,
                position_ids=None,
                input_shape=input_shape,
                padding_length=padding_length,
                padded_seq_length=None,
                device=None,
            )
        outputs = model(input_ids, attention_mask, labels=targets) # sending inputs to the forward method
        print(outputs)
        loss = outputs.loss
        logits = outputs.logits
        if _%500==0:
           print(f'Epoch: {epoch}, Loss:  {loss}')

for epoch in range(1):
    train(epoch)


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.