*Bounty: 100*

*Bounty: 100*

Let $mgeq 1$ be an integer and $Fin mathbb{R}[x_1, dots, x_m]$ be a polynomial. I want to approximate $F$ on the unit hypercube $[0, 1]^m$ by a (possibly multilayer) feedforward neural network. The activation function is $mathrm{tanh}$ for all the connections.

Let $varepsilon>0$ be a real number. If I want the approximation to deviate from $F$ by less than $varepsilon$ in the $L^2$ norm what is the smallest possible number of non-zero weights?

It is kind of stupid to approximate a function that is known to be polynomial by a neural network but I just wanted to get more quantitative insight into the universal approximation theorem (and polynomials seem to be the most accessible class of functions).