# #StackBounty: #neural-network #reinforcement-learning Temporal difference learning with a neural network

### Bounty: 100

Suppose I want to train a value network $$v$$ via TD(0). So my TD target for a time step $$t$$ equals $$R_{t+1} + gamma v(s_{t+1})$$. If I understand correctly I just need to use mean squared error, so that $$v(s_t)$$ becomes closer to this target. But my network outputs values between $$(-1; 1)$$ and rewards are from this interval also, so the TD target lies between $$(-2; 2)$$. Should I scale it before apply learning? What are the consequences of not doing this i.e. training a neural network with target values from a broader interval that it’s output? Can we say anything about it from theoretical point of view?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.