Bounty: 50
I know that in the math on which the transformer is based there is no restriction on the length of input. But I still can’t understand why we should fix it in the frameworks (PyTorch). Because of this problem Transformer-XL have been created.
Can you explain to me where this problem is hiding, please?