I want to use the VGG16 model pre-trained on ImageNet and fine-tune some layers to my dataset. The VGG16 paper explains their preprocessing steps which I understand to be important to replicate if one wants to fine-tune someone’s network.
The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.
- Why didn’t they also divide by the standard deviation? I thought this kind of standardization (i.e. zero center and unit variance) was good practice.
More importantly, since I am fine-tuning the network to my own dataset, I wonder if I should
- Standardize the input relative to ImageNet and my dataset, only to ImageNet, or only to my dataset?