In this video on Deep Gaussian Processes, Neil Lawrence (at 37:30) mentions that thinking of the layers of a neural network in terms of basis functions from Gaussian processes can be viewed as a motivation/explanation for Ioffe/Szegedy’s batch normalization. Can someone explain this comment to me? I’m not understanding what he means.

