In a simple linear regression setting, it is common to talk about a minimum number of observations per parameter (which characterise the the degree of freedom). And it is easy to see that for multiple regression, there is a one to one correspondence between the features and the parameters. So, we can directly compare the number of observations to the number of parameters.
However the VGG model for instance has 138M parameters and is trained on 1.2M images giving a ratio of about 1/100 for observations/parameters. Clearly, the rule of thumb ratio of anywhere between 10/1 to 30/1 is not respected here.
My understanding of this problem is that most of the parameters are in the fully connected layers and they share the information from all the pixels for each images, so that there is no “1 to 1” correspondence between observations and parameters?