*Bounty: 50*

*Bounty: 50*

So in the mobilenet-v1 network, depthwise conv layers are used. And I understand that as follows.

For a input feature map of `(C_in, F_in, F_in)`

, we take only 1 kernel with `C_in`

channels, let’s say, with size `(C_in, K, K)`

, and convolve each channel of the kernel to each channel of the input, to produce a `(C_in, F_out, F_out)`

feature map. Then do pointwise conv to combine those feature maps, using `C_out`

kernels with size `(C_in, 1, 1)`

, we get a result of `(1, F_out, F_out)`

. The kernel parameter reduce ratio comparing to normal conv is:

`(K*K*C_in+C_in*C_out)/(K*K*C_in*C_out) = 1/C_out + 1/(K*K)`

And I also checked `Conv2d`

(doc) in pytorch, it is said one can achieve the depthwise convolution setting `groups`

parameter equals to `C_in`

. But as I read related articles, the logic behind setting `groups`

looks different with the above depthwise convolution operation that mobilenet used. Let’s say, we have `C_in=6`

, and `C_out=18`

, `groups=6`

means you divide both input and output channels to `6`

groups. In each group, `3`

kernels each having `1`

channel is used to conv with a input channel, so a total of `18`

output channels can be produced.

But for a normal convolution, `18*6`

total kernel-channels are used for `18 kernels, each having 6 channels`

. So the reduce ratio is `18/(18*6)`

, thus the reduce ratio is `1/C_in=1/Groups`

. Leaving out the pointwise conv not considered, this number is different with the `1/C_out`

in above conclusion.

Can anyone explain where am I wrong? Is it bcause I missed something when `C_out`

= `factor * C_in`

(factor > 1) ?