#StackBounty: #machine-learning #neural-networks #moving-average #batch-normalization Why does batch norm uses exponentially weighted a…

Bounty: 50

I was watching a lecture by Andrew Ng on batch normalization. When discussing the inference (prediction) on a test it is said that an exponentially weighted average (EWA) of batch normalization parameters is used. My question is: why use exponentially weighted average instead of a "simple" average without any weights (or, to be precise, with equal weights)?

I intuit that:

1. the latest batch is computed on weights being the closest to the final ones therefore we want it to influence the data at test time the most,
2. at the same time we do not want to get rid significant part of data used for previous training, so we let them influence predictions but in smaller degree (smaller weights).

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.