*Bounty: 50*

*Bounty: 50*

Consider the following problem involving neural networks. The input of the neural network are $n$ paths of a diffusion model i.e.: $ dX(t)=mu dt + sigma dW(t) $, at some random time $t$.

$$ input = [ x_{1j}, x_{2j}, x_{3j}, …,x_{nj} ] $$

And the training data for the network is the average of the input

$$ training_data = k_j = frac{1}{n} sum_{i=1}^{n} lambda x_{ij}^2 $$

The loss function is the MSE of the following difference

$$ loss_function =frac{1}{m} sum_{j=0}^{m-1} (k_j-bar k_j)^2 $$

Where

- $i$ – is the path index (or input node index)
- $j$ – is the time index (shuffled randomly)
- $k_j$ – is the squared function average
- $bar k_j$ – is the squared function average approximated by the

network - $lambda$ – is some constant to approximate.

The goal here is to generate training data with i.e.: $lambda=0.25$, and see if the network can find that value. In other words I want the network single node output to converge to a value of $lambda$ which minimizes the loss function.

For example,

**Step 1** – Create the neural network input by simulating the paths

**Step 2** – Create the training data with $lambda = 0.25$

**Step 3** – Create the neural network architecture as follows and initialize attempt to approximate the right $lambda$.

**Step 4** – Train the network by selecting randomized $j$ indexes.

However, the network is not converging to the $lambda$ value which the training data was generated from.

I know there are things that can help convergence such as data normalization, advanced stochastic gradient descent methods, deep hidden layers, or mini-batch and batch training, etc.

But beside the normal tricks – is there something fundamentally wrong with my problem. Can the neural network learn the average sum squared of the input?

I will add any additional information if necessary.

Any help is truly appreciated as this problem is very important to me.