# #StackBounty: #machine-learning #neural-networks #deep-learning #optimization Parameter optimization with Neural Networks

### Bounty: 50

Consider the following problem involving neural networks. The input of the neural network are \$n\$ paths of a diffusion model i.e.: \$ dX(t)=mu dt + sigma dW(t) \$, at some random time \$t\$.

\$\$ input = [ x_{1j}, x_{2j}, x_{3j}, …,x_{nj} ] \$\$

And the training data for the network is the average of the input

\$\$ training_data = k_j = frac{1}{n} sum_{i=1}^{n} lambda x_{ij}^2 \$\$

The loss function is the MSE of the following difference

\$\$ loss_function =frac{1}{m} sum_{j=0}^{m-1} (k_j-bar k_j)^2 \$\$

Where

• \$i\$ – is the path index (or input node index)
• \$j\$ – is the time index (shuffled randomly)
• \$k_j\$ – is the squared function average
• \$bar k_j\$ – is the squared function average approximated by the
network
• \$lambda\$ – is some constant to approximate.

The goal here is to generate training data with i.e.: \$lambda=0.25\$, and see if the network can find that value. In other words I want the network single node output to converge to a value of \$lambda\$ which minimizes the loss function.

For example,

Step 1 – Create the neural network input by simulating the paths

Step 2 – Create the training data with \$lambda = 0.25\$

Step 3 – Create the neural network architecture as follows and initialize attempt to approximate the right \$lambda\$.

Step 4 – Train the network by selecting randomized \$j\$ indexes.

However, the network is not converging to the \$lambda\$ value which the training data was generated from.

I know there are things that can help convergence such as data normalization, advanced stochastic gradient descent methods, deep hidden layers, or mini-batch and batch training, etc.

But beside the normal tricks – is there something fundamentally wrong with my problem. Can the neural network learn the average sum squared of the input?