*Bounty: 100*

*Bounty: 100*

I’m in the progress to learn, and understand different neural networks. I pretty much understand now feed-forward neural networks, and back-propagation of them, and now learning convolutional neural networks. I understand the forward-propagation of them, but having issues understanding their back-propagation. There is a very good resource explaining the convolutional layer, however, can’t understand the back-propagation.

In my understanding, according the back-propagation algorithm of feed-forward neural networks/multi-layer perception, if I have the following input (its items as $i$), and filter (its items as $w$), giving the output (its items as $o$).

$$begin{pmatrix}i_{1}^1 & i_{2}^1 & i_{3}^1\ i_{4}^1 & i_{5}^1 & i_{6}^1\ i_{7}^1 & i_{8}^1 & i_{9}^1end{pmatrix} * begin{pmatrix}w_1^1 & w_2^1\ w_3^1 & w_4^1end{pmatrix} = begin{pmatrix}o_1^1 & o_2^1\ o_3^1 & o_4^1end{pmatrix}$$

So if we want to calculate for example how much $w_1^1$ affected the cost $C$, we need to know how much $w_1^1$ affected its corresponding output item $o_1^1$, and how much $o_1^1$ affected the cost $C$ which gives the following equation:

$$frac{partial C}{partial w_1^1} = frac{partial o^1}{partial w_1^1}frac{partial C}{partial o^1}$$

Where in my thinking we have to think back how we get the output regarding to $w_1^1$ to calculate $frac{partial o^1}{partial w_1^1}$.

To get $o_1^1$, we multiplied $w_1^1$ with $i_1^1$, to get $o_2^1$, multiplied $w_1^1$ with $i_2^1$, to get $o_3^1$, multiplied $w_1^1$ with $i_4^1$, to get $o_4^1$, multiplied $w_1^1$ with $i_5^1$.

To calculate $frac{partial C}{partial o^1}$, it depends on how the output is connected with the next layer. If it is an another convolutional layer, then we have to calculate how each output item is connected to the next layers outputs, which will be their connecting weights.

So if we see an example, where we put a 2×2 filter on $o^1$, to get the final output $o^2$ (which will give a single output with 1×1 size):

$$begin{pmatrix}o_1^1 & o_2^1\ o_3^1 & o_4^1end{pmatrix} * begin{pmatrix}w_1^2 & w_2^2\ w_3^2 & w_4^2end{pmatrix} = begin{pmatrix}o_1^2end{pmatrix}$$

Where in my thinking the back-propagation for $w_1^2$ is:

$$frac{partial C}{partial w_1^2} = frac{partial o^2}{partial w_1^2}frac{partial C}{partial o^2} = o_1^1 * 2(o^2_1 – y_1)$$,

and the back-propagation for $w_1^1$ is:

$$frac{partial C}{partial w_1^1} = frac{partial o^1}{partial w_1^1}frac{partial C}{partial o^1}$$

Where: $$frac{partial o^1}{partial w_1^1} = (i_1^1 + i_2^1 + i_4^1 + i_5^1)$$

And: $$frac{partial C}{partial o^1} = frac{partial o_1^2}{partial o_1^1}frac{partial C}{partial o_1^2} + frac{partial o_1^2}{partial o_2^1}frac{partial C}{partial o_1^2} +frac{partial o_1^2}{partial o_3^1}frac{partial C}{partial o_1^2} +frac{partial o_1^2}{partial o_4^1}frac{partial C}{partial o_1^2}$$

So: $$frac{partial C}{partial o^1} = w_1^2 * 2(o_1^2 – y_1) + w_2^2 * 2(o_1^2 – y_1) + w_3^2 * 2(o_1^2 – y_1) + w_4^2 * 2(o_1^2 – y_1)$$

Am I right? Because as I’m reading through the article above, it seems completely different.