#StackBounty: #neural-network #deep-learning #pytorch #deep-network Understanding depthwise convolution vs convolution with group param…

Bounty: 50

So in the mobilenet-v1 network, depthwise conv layers are used. And I understand that as follows.

For a input feature map of (C_in, F_in, F_in), we take only 1 kernel with C_in channels, let’s say, with size (C_in, K, K), and convolve each channel of the kernel to each channel of the input, to produce a (C_in, F_out, F_out) feature map. Then do pointwise conv to combine those feature maps, using C_out kernels with size (C_in, 1, 1), we get a result of (1, F_out, F_out). The kernel parameter reduce ratio comparing to normal conv is:

(K*K*C_in+C_in*C_out)/(K*K*C_in*C_out) = 1/C_out + 1/(K*K)

And I also checked Conv2d(doc) in pytorch, it is said one can achieve the depthwise convolution setting groups parameter equals to C_in. But as I read related articles, the logic behind setting groups looks different with the above depthwise convolution operation that mobilenet used. Let’s say, we have C_in=6, and C_out=18, groups=6 means you divide both input and output channels to 6 groups. In each group, 3 kernels each having 1 channel is used to conv with a input channel, so a total of 18 output channels can be produced.

But for a normal convolution, 18*6 total kernel-channels are used for 18 kernels, each having 6 channels. So the reduce ratio is 18/(18*6), thus the reduce ratio is 1/C_in=1/Groups . Leaving out the pointwise conv not considered, this number is different with the 1/C_out in above conclusion.

Can anyone explain where am I wrong? Is it bcause I missed something when C_out = factor * C_in (factor > 1) ?

Get this bounty!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.