*Bounty: 100*

*Bounty: 100*

This is my first Neural Network, specifically a multilayer feed forward neural network that uses back-propagation for training, and I plan on using it for a multitude of projects. I started with the XOR function and now I’m moving to OCR. This network, as far as I know, can also be used for deep learning. I chose Swift because it’s the language I’m most comfortable with. However, I may convert it to C++ for learning purposes.

Does anyone have any suggestions on how I could make this code more bullet proof, improve the quality, improve performance, more robust/versatile, etc.?

Also, is this a good style to use? Meaning is it common to use all class functions? I feel like this would reduce, if not eliminate, any side effects and makes the code more thread safe? I can easily whip up some instance methods that call the class methods, then in turn store the weights, activations, derivatives, etc..

It seems like it’s pretty fast compared to other neural networks that I’ve downloaded and compared it to (only 2 in all honesty). However, as this is my first ANN, I’m not sure what “fast” is defined as. When attempting to train and test it with the MNIST dataset on a late 2015 iMac 4GHz Intel Core i7 Processor with 8GB DDR3 the results are as follows:

## Topology

**Number Of Layers:** 3

**Number of Input Neurons:** 784

**Number of Hidden Neurons:** 20

**Number of Output Neurons:** 10

## Training

**Epochs:** 100

**Final Cost:** 0.00529893

**Cost Function:** Mean Squared

**Activation Function:** Sigmoid

**Number Of Training Examples:** 60,000

**Elapsed Time:** 24 minutes and 6 seconds

## Testing

**Number of Testing Examples:** 10,000

**Number of Correct Predictions:** 9,297

**Ratio:** 92.97%

*Quick side note: I’m new to optimization as well but, it’s definitely something I would like to be better at!*

I would greatly appreciate any suggestions! Please be as harsh as you deem necessary! Also, if any more information is needed feel free to let me know.

**EDIT 1:**

I doubt this is applicable here but, I would like to implement some type of “live” testing where the user could draw a number on the screen, feed it forward, and get a predication.

Do I need to normalize the image the same way as the MNIST data? (e.g. normalize to fit in a 20×20 image while preserving the aspect ratio, center in a 28×28 image, and compute the center of mass. Then, translate the image so as to position this point at the center of the 28×28 field.) The only problem with that is I don’t know the anti-aliasing technique used by the normalization algorithm to get the grey scale levels (hopefully I could just email Yann LeCun and find out though).

Here’s the GitHub incase anyone finds that easier to read: JPSNeuralNetwork

**JPSNeuralNetwork.swift**

```
//
// JPSNeuralNetwork.swift
//
// Created by Jonathan Sullivan on 4/4/17.
//
import Foundation
import Accelerate
public protocol JPSNeuralNetworkDelegate
{
func network(costDidChange cost: Float)
func network(progressDidChange progress: Float)
func network(overallProgressDidChange progress: Float)
}
public class JPSNeuralNetwork
{
private typealias FeedForwardResult = (inputs: Matrix, activations: Matrix, activationRates: Matrix)
private class func cost(costFunction: JPSNeuralNetworkCostFunction, activations: Matrix, targetOutputs: Matrix) -> Scalar
{
var cost: Scalar = 0
for (activation, targetOutput) in zip(activations, targetOutputs) {
cost += costFunction.cost(forOutputs: activation, targetOutputs: targetOutput)
}
cost /= Scalar(targetOutputs.count)
return cost
}
private class func weights(forTopology topology: [Int]) -> Matrix
{
var weights = Matrix()
var previousNumberOfInputs = topology[0]
for neuronCount in topology[1..<topology.count]
{
// Plus one for the bias weight.
let neuronWeights = JPSNeuralNetworkLayer.randomWeights(neuronCount: neuronCount, inputCount: previousNumberOfInputs + 1)
weights.append(neuronWeights)
previousNumberOfInputs = neuronCount
}
return weights
}
public class func feedForward(topology: [Int], activationFunction: JPSNeuralNetworkActivationFunction, inputs: Vector, weights: Matrix) -> Vector {
return JPSNeuralNetwork.feedForward(topology: topology, activationFunction: activationFunction, inputs: inputs, weights: weights).activations.last!
}
private class func feedForward(topology: [Int], activationFunction: JPSNeuralNetworkActivationFunction, inputs: Vector, weights: Matrix) -> FeedForwardResult
{
var previousActivations = inputs
var networkInputs = Matrix()
var networkActivations = Matrix()
var networkActivationRates = Matrix()
// Ignore the input layer as it's just a place holder.
for (neuronCount, layerWeights) in zip(topology[1..<topology.count], weights)
{
// Append one for the bias input.
var layerInputs = previousActivations
layerInputs.append(1)
networkInputs.append(layerInputs)
let feedForward = JPSNeuralNetworkLayer.feedForward(neuronCount: neuronCount, activationFunction: activationFunction, inputs: layerInputs, weights: layerWeights)
previousActivations = feedForward.activations
networkActivations.append(previousActivations)
networkActivationRates.append(feedForward.activationRates)
}
return (networkInputs, networkActivations, networkActivationRates)
}
private class func outputGradientFor(costFunction: JPSNeuralNetworkCostFunction, activations: Vector, activationRates: Vector, targetOutputs: Vector) -> Vector
{
var gradient = Vector()
for (activationRate, (activation, targetOutput)) in zip(activationRates, zip(activations, targetOutputs))
{
let costRate = costFunction.derivative(OfOutput: activation, targetOutput: targetOutput)
let error = (costRate * activationRate)
gradient.append(error)
}
return gradient
}
private class func gradientFor(costFunction: JPSNeuralNetworkCostFunction, activations: Matrix, activationRates: Matrix, weights: Matrix, targetOutputs: Vector) -> Matrix
{
let reversedWeights = weights.reversed()
var reversedActivations = (activations.reversed() as Matrix)
var reversedActivationRates = (activationRates.reversed() as Matrix)
let outputLayerActivations = reversedActivations.removeFirst()
let outputLayerActivationRates = reversedActivationRates.removeFirst()
var previousGradient = JPSNeuralNetwork.outputGradientFor(costFunction: costFunction, activations: outputLayerActivations, activationRates: outputLayerActivationRates, targetOutputs: targetOutputs)
var gradient = Matrix()
gradient.append(previousGradient)
for (layerActivationRates, (layerActivations, layerWeights)) in zip(reversedActivationRates, zip(reversedActivations, reversedWeights))
{
previousGradient = JPSNeuralNetworkLayer.gradientFor(activations: layerActivations, activationRates: layerActivationRates, weights: layerWeights, gradient: previousGradient)
gradient.append(previousGradient)
}
return gradient.reversed()
}
private class func updateWeights(learningRate: Float, inputs: Matrix, weights: Matrix, gradient: Matrix) -> Matrix
{
var newWeights = Matrix()
for ((layerInputs, layerWeights), layerGradient) in zip(zip(inputs, weights), gradient)
{
let newLayerWeights = JPSNeuralNetworkLayer.updateWeights(learningRate: learningRate, inputs: layerInputs, weights: layerWeights, gradient: layerGradient)
newWeights.append(newLayerWeights)
}
return newWeights
}
private class func backpropagate(learningRate: Float, costFunction: JPSNeuralNetworkCostFunction, inputs: Matrix, weights: Matrix, activations: Matrix, activationRates: Matrix, targetOutput: Vector) -> Matrix
{
let gradient = JPSNeuralNetwork.gradientFor(costFunction: costFunction, activations: activations, activationRates: activationRates, weights: weights, targetOutputs: targetOutput)
return JPSNeuralNetwork.updateWeights(learningRate: learningRate, inputs: inputs, weights: weights, gradient: gradient)
}
public class func train(delegate: JPSNeuralNetworkDelegate?, topology: [Int], epochs: Int, learningRate: Float, activationFunction: JPSNeuralNetworkActivationFunction, costFunction: JPSNeuralNetworkCostFunction, trainingInputs: Matrix, targetOutputs: Matrix) -> Matrix
{
var weights = JPSNeuralNetwork.weights(forTopology: topology)
for epoch in 0..<epochs
{
var activations = Matrix()
for (index, (inputs, targetOutput)) in zip(trainingInputs, targetOutputs).enumerated()
{
let progress = (Float(index + 1) / Float(targetOutputs.count))
delegate?.network(progressDidChange: progress)
let overallProgress = ((Float(epoch) + progress) / Float(epochs))
delegate?.network(overallProgressDidChange: overallProgress)
let feedForward: FeedForwardResult = JPSNeuralNetwork.feedForward(topology: topology, activationFunction: activationFunction, inputs: inputs, weights: weights)
activations.append(feedForward.activations.last!)
weights = JPSNeuralNetwork.backpropagate(learningRate: learningRate, costFunction: costFunction, inputs: feedForward.inputs, weights: weights, activations: feedForward.activations, activationRates: feedForward.activationRates, targetOutput: targetOutput)
}
let cost = JPSNeuralNetwork.cost(costFunction: costFunction, activations: activations, targetOutputs: targetOutputs)
delegate?.network(costDidChange: cost)
}
return weights
}
}
```

**JPSNeuralNetworkLayer.swift**

```
//
// JPSNeuralNetworkLayer.swift
//
// Created by Jonathan Sullivan on 4/4/17.
//
import Foundation
import Accelerate
public class JPSNeuralNetworkLayer
{
/**
Used to generate a random weights for all neurons.
*/
public class func randomWeights(neuronCount: Int, inputCount: Int) -> Vector
{
var layerWeights = Vector()
for _ in 0..<neuronCount
{
let neuronWeights = JPSNeuralNetworkNeuron.randomWeights(inputCount: inputCount)
layerWeights.append(contentsOf: neuronWeights)
}
return layerWeights
}
/**
Used to feed the inputs and weights forward and calculate the weighted input and activation.
This method also precalculates the activation rate for use later on and to reduce the number of
calculations.
weightedInput = sum(x[i] * w[i])
activation = sigma(weightedInput[j])
activationRate = sigma'(activation[j])
*/
public class func feedForward(neuronCount: Int, activationFunction: JPSNeuralNetworkActivationFunction, inputs: Vector, weights: Vector) -> (activations: Vector, activationRates: Vector)
{
var activations = Vector(repeating: 0, count: neuronCount)
vDSP_mmul(weights, 1,
inputs, 1,
&activations, 1,
vDSP_Length(neuronCount), 1,
vDSP_Length(inputs.count))
activations = activations.map({
return activationFunction.activation($0)
})
let activationRates = activations.map({
return activationFunction.derivative($0)
})
return (activations, activationRates)
}
/**
Used to calculate the error gradient for each neuron.
*/
public class func gradientFor(activations: Vector, activationRates: Vector, weights: Vector, gradient: Vector) -> Vector
{
var layerGradient = Vector(repeating: 0, count: activations.count)
vDSP_mmul(gradient, 1,
weights, 1,
&layerGradient, 1,
1, vDSP_Length(activations.count),
vDSP_Length(gradient.count))
vDSP_vmul(layerGradient, 1,
activationRates, 1,
&layerGradient, 1,
vDSP_Length(layerGradient.count))
return layerGradient
}
/**
Used to generate update each neurons weights on a per neuron error basis given the input.
*/
public class func updateWeights(learningRate: Float, inputs: Vector, weights: Vector, gradient: Vector) -> Vector
{
var nagativeLearningRate = -learningRate
var scaledGradient = Vector(repeating: 0, count: gradient.count)
vDSP_vsmul(gradient, 1,
&nagativeLearningRate,
&scaledGradient, 1,
vDSP_Length(gradient.count))
var scaledInputs = Vector(repeating: 0, count: weights.count)
vDSP_mmul(scaledGradient, 1,
inputs, 1,
&scaledInputs, 1,
vDSP_Length(scaledGradient.count), vDSP_Length(inputs.count),
1)
var layerWeights = Vector(repeating: 0, count: weights.count)
vDSP_vadd(weights, 1,
scaledInputs, 1,
&layerWeights, 1,
vDSP_Length(weights.count))
return layerWeights
}
}
```

**JPSNeuralNetworkNeuron.swift**

```
//
// JPSNeuralNetworkNeuron.swift
//
// Created by Jonathan Sullivan on 4/4/17.
//
import Foundation
public typealias Scalar = Float
public typealias Vector = [Scalar]
public typealias Matrix = [Vector]
public enum JPSNeuralNetworkCostFunction: Int
{
case meanSquared = 0
case crossEntropy = 1
func derivative(OfOutput output: Scalar, targetOutput: Scalar) -> Scalar
{
switch self
{
case .crossEntropy:
return (output - targetOutput) / ((1 - output) * output)
case .meanSquared:
fallthrough
default:
return (output - targetOutput)
}
}
func cost(forOutputs outputs: Vector, targetOutputs: Vector) -> Scalar
{
switch self
{
case .crossEntropy:
return -zip(outputs, targetOutputs).reduce(0, { (sum, pair) -> Scalar in
let temp = pair.1 * log(pair.0)
return sum + temp + (1 - pair.1) * log(1 - pair.0)
})
case .meanSquared:
fallthrough
default:
return 0.5 * zip(outputs, targetOutputs).reduce(0, { (sum, pair) -> Scalar in
return pow(pair.1 - pair.0, 2)
})
}
}
}
public enum JPSNeuralNetworkActivationFunction: Int
{
case sigmoid = 0
case hyperbolicTangent = 1
func derivative(_ activation: Scalar) -> Scalar
{
switch self
{
case .hyperbolicTangent:
return (1 - pow(activation, 2))
case .sigmoid:
fallthrough
default:
return (activation * (1 - activation))
}
}
func activation(_ weightedInput: Scalar) -> Scalar
{
switch self
{
case .hyperbolicTangent:
return tanh(weightedInput)
case .sigmoid:
fallthrough
default:
return (1 / (1 + exp(-weightedInput)))
}
}
}
public class JPSNeuralNetworkNeuron
{
/**
Used to generate a single random weight.
*/
private class func randomWeight(inputCount: Int) -> Scalar
{
let range = (1 / sqrt(Scalar(inputCount)))
let rangeInt = UInt32(2000000 * range)
let randomDouble = Scalar(arc4random_uniform(rangeInt)) - Scalar(rangeInt / 2)
return (randomDouble / 1000000)
}
/**
Used to generate a vector of random weights.
*/
public class func randomWeights(inputCount: Int) -> Vector
{
var weights = Vector()
for _ in 0..<inputCount
{
let weight = JPSNeuralNetworkNeuron.randomWeight(inputCount: inputCount)
weights.append(weight)
}
return weights
}
}
```