I want to design an end-to-end system that has components of both feedforward neural networks and recurrent neural networks. For example the data can have different components (some sequential in nature, others not, perhaps images or even a single number that is better processed by a feedforward network).
My main concern is that when combining different components of different networks I might be combining things that work well on one but not as well on the other. Or perhaps one mainly has $tanh(x)$ as activations while the other has mainly $relu(x) = max(0,x)$. In these cases I am unsure if my model might suffer and not train or who knows what other issue might arise. Other concerns I have was that as far as I know some methods like batch-normalization work mostly for feedfoward models but not as well on RNNs. I know there have been many discoveries that can made DL work now days so I want to be able to combine them appropriately.
So my question is, when we mix components from different types of Neural Networks, are there any heuristics or known empirical results to how to combine these different blocks? e.g. as random ideas I have right now, if we use LSTMS combined with feedfoward nets perhaps we always need to have residual connections (that look similar to memory channels) or if we use RNNs + FFs then we always have to use ReLUs everywhere…or tanh’s everywhere…Is perhaps the trick to insert normalization layers here and there? Or Selu’s here and there?
How do we combine different type models successfully (especially recurrent nets with feedforward nets)?
An example task perhaps of this form could be the input X being an Image and a Question (text) to 5 options (multiple choice). Or Image and Question and we have to answer a question about it (e.g. CLEVR data set).
There are many problems of this sort.
Cross posted: https://qr.ae/TWvXri