## #StackBounty: #swift #ai #neural-network Neural Network in Swift

### Bounty: 100

This is my first Neural Network, specifically a multilayer feed forward neural network that uses back-propagation for training, and I plan on using it for a multitude of projects. I started with the XOR function and now I’m moving to OCR. This network, as far as I know, can also be used for deep learning. I chose Swift because it’s the language I’m most comfortable with. However, I may convert it to C++ for learning purposes.

Does anyone have any suggestions on how I could make this code more bullet proof, improve the quality, improve performance, more robust/versatile, etc.?

Also, is this a good style to use? Meaning is it common to use all class functions? I feel like this would reduce, if not eliminate, any side effects and makes the code more thread safe? I can easily whip up some instance methods that call the class methods, then in turn store the weights, activations, derivatives, etc..

It seems like it’s pretty fast compared to other neural networks that I’ve downloaded and compared it to (only 2 in all honesty). However, as this is my first ANN, I’m not sure what “fast” is defined as. When attempting to train and test it with the MNIST dataset on a late 2015 iMac 4GHz Intel Core i7 Processor with 8GB DDR3 the results are as follows:

## Topology

Number Of Layers: 3
Number of Input Neurons: 784
Number of Hidden Neurons: 20
Number of Output Neurons: 10

## Training

Epochs: 100
Final Cost: 0.00529893
Cost Function: Mean Squared
Activation Function: Sigmoid
Number Of Training Examples: 60,000
Elapsed Time: 24 minutes and 6 seconds

## Testing

Number of Testing Examples: 10,000
Number of Correct Predictions: 9,297
Ratio: 92.97%

Quick side note: I’m new to optimization as well but, it’s definitely something I would like to be better at!

I would greatly appreciate any suggestions! Please be as harsh as you deem necessary! Also, if any more information is needed feel free to let me know.

EDIT 1:

I doubt this is applicable here but, I would like to implement some type of “live” testing where the user could draw a number on the screen, feed it forward, and get a predication.

Do I need to normalize the image the same way as the MNIST data? (e.g. normalize to fit in a 20×20 image while preserving the aspect ratio, center in a 28×28 image, and compute the center of mass. Then, translate the image so as to position this point at the center of the 28×28 field.) The only problem with that is I don’t know the anti-aliasing technique used by the normalization algorithm to get the grey scale levels (hopefully I could just email Yann LeCun and find out though).

Here’s the GitHub incase anyone finds that easier to read: JPSNeuralNetwork

JPSNeuralNetwork.swift

//
//  JPSNeuralNetwork.swift
//
//  Created by Jonathan Sullivan on 4/4/17.
//

import Foundation
import Accelerate

public protocol JPSNeuralNetworkDelegate
{
func network(costDidChange cost: Float)
func network(progressDidChange progress: Float)
func network(overallProgressDidChange progress: Float)
}

public class JPSNeuralNetwork
{
private typealias FeedForwardResult = (inputs: Matrix, activations: Matrix, activationRates: Matrix)

private class func cost(costFunction: JPSNeuralNetworkCostFunction, activations: Matrix, targetOutputs: Matrix) -> Scalar
{
var cost: Scalar = 0

for (activation, targetOutput) in zip(activations, targetOutputs) {
cost += costFunction.cost(forOutputs: activation, targetOutputs: targetOutput)
}

cost /= Scalar(targetOutputs.count)

return cost
}

private class func weights(forTopology topology: [Int]) -> Matrix
{
var weights = Matrix()

var previousNumberOfInputs = topology[0]

for neuronCount in topology[1..<topology.count]
{
// Plus one for the bias weight.

let neuronWeights = JPSNeuralNetworkLayer.randomWeights(neuronCount: neuronCount, inputCount: previousNumberOfInputs + 1)
weights.append(neuronWeights)

previousNumberOfInputs = neuronCount
}

return weights
}

public class func feedForward(topology: [Int], activationFunction: JPSNeuralNetworkActivationFunction, inputs: Vector, weights: Matrix) -> Vector {
return JPSNeuralNetwork.feedForward(topology: topology, activationFunction: activationFunction, inputs: inputs, weights: weights).activations.last!
}

private class func feedForward(topology: [Int], activationFunction: JPSNeuralNetworkActivationFunction, inputs: Vector, weights: Matrix) -> FeedForwardResult
{
var previousActivations = inputs

var networkInputs = Matrix()
var networkActivations = Matrix()
var networkActivationRates = Matrix()

// Ignore the input layer as it's just a place holder.

for (neuronCount, layerWeights) in zip(topology[1..<topology.count], weights)
{
// Append one for the bias input.

var layerInputs = previousActivations
layerInputs.append(1)
networkInputs.append(layerInputs)

let feedForward = JPSNeuralNetworkLayer.feedForward(neuronCount: neuronCount, activationFunction: activationFunction, inputs: layerInputs, weights: layerWeights)

previousActivations = feedForward.activations

networkActivations.append(previousActivations)
networkActivationRates.append(feedForward.activationRates)
}

return (networkInputs, networkActivations, networkActivationRates)
}

private class func outputGradientFor(costFunction: JPSNeuralNetworkCostFunction, activations: Vector, activationRates: Vector, targetOutputs: Vector) -> Vector
{

for (activationRate, (activation, targetOutput)) in zip(activationRates, zip(activations, targetOutputs))
{
let costRate = costFunction.derivative(OfOutput: activation, targetOutput: targetOutput)
let error = (costRate * activationRate)
}

}

private class func gradientFor(costFunction: JPSNeuralNetworkCostFunction, activations: Matrix, activationRates: Matrix, weights: Matrix, targetOutputs: Vector) -> Matrix
{
let reversedWeights = weights.reversed()
var reversedActivations = (activations.reversed() as Matrix)
var reversedActivationRates = (activationRates.reversed() as Matrix)

let outputLayerActivations = reversedActivations.removeFirst()
let outputLayerActivationRates = reversedActivationRates.removeFirst()

for (layerActivationRates, (layerActivations, layerWeights)) in zip(reversedActivationRates, zip(reversedActivations, reversedWeights))
{

}

}

private class func updateWeights(learningRate: Float, inputs: Matrix, weights: Matrix, gradient: Matrix) -> Matrix
{
var newWeights = Matrix()

{
newWeights.append(newLayerWeights)
}

return newWeights
}

private class func backpropagate(learningRate: Float, costFunction: JPSNeuralNetworkCostFunction, inputs: Matrix, weights: Matrix, activations: Matrix, activationRates: Matrix, targetOutput: Vector) -> Matrix
{
let gradient = JPSNeuralNetwork.gradientFor(costFunction: costFunction, activations: activations, activationRates: activationRates, weights: weights, targetOutputs: targetOutput)

}

public class func train(delegate: JPSNeuralNetworkDelegate?, topology: [Int], epochs: Int, learningRate: Float, activationFunction: JPSNeuralNetworkActivationFunction, costFunction: JPSNeuralNetworkCostFunction, trainingInputs: Matrix, targetOutputs: Matrix) -> Matrix
{
var weights = JPSNeuralNetwork.weights(forTopology: topology)

for epoch in 0..<epochs
{
var activations = Matrix()

for (index, (inputs, targetOutput)) in zip(trainingInputs, targetOutputs).enumerated()
{
let progress = (Float(index + 1) / Float(targetOutputs.count))
delegate?.network(progressDidChange: progress)

let overallProgress = ((Float(epoch) + progress) / Float(epochs))
delegate?.network(overallProgressDidChange: overallProgress)

let feedForward: FeedForwardResult = JPSNeuralNetwork.feedForward(topology: topology, activationFunction: activationFunction, inputs: inputs, weights: weights)
activations.append(feedForward.activations.last!)

weights = JPSNeuralNetwork.backpropagate(learningRate: learningRate, costFunction: costFunction, inputs: feedForward.inputs, weights: weights, activations: feedForward.activations, activationRates: feedForward.activationRates, targetOutput: targetOutput)
}

let cost = JPSNeuralNetwork.cost(costFunction: costFunction, activations: activations, targetOutputs: targetOutputs)
delegate?.network(costDidChange: cost)
}

return weights
}
}


JPSNeuralNetworkLayer.swift

//
//  JPSNeuralNetworkLayer.swift
//
//  Created by Jonathan Sullivan on 4/4/17.
//

import Foundation
import Accelerate

public class JPSNeuralNetworkLayer
{
/**
Used to generate a random weights for all neurons.
*/
public class func randomWeights(neuronCount: Int, inputCount: Int) -> Vector
{
var layerWeights = Vector()

for _ in 0..<neuronCount
{
let neuronWeights = JPSNeuralNetworkNeuron.randomWeights(inputCount: inputCount)
layerWeights.append(contentsOf: neuronWeights)
}

return layerWeights
}

/**
Used to feed the inputs and weights forward and calculate the weighted input and activation.
This method also precalculates the activation rate for use later on and to reduce the number of
calculations.

weightedInput = sum(x[i] * w[i])
activation = sigma(weightedInput[j])
activationRate = sigma'(activation[j])
*/
public class func feedForward(neuronCount: Int, activationFunction: JPSNeuralNetworkActivationFunction, inputs: Vector, weights: Vector) -> (activations: Vector, activationRates: Vector)
{
var activations = Vector(repeating: 0, count: neuronCount)

vDSP_mmul(weights, 1,
inputs, 1,
&activations, 1,
vDSP_Length(neuronCount), 1,
vDSP_Length(inputs.count))

activations = activations.map({
return activationFunction.activation($0) }) let activationRates = activations.map({ return activationFunction.derivative($0)
})

return (activations, activationRates)
}

/**
Used to calculate the error gradient for each neuron.
*/
public class func gradientFor(activations: Vector, activationRates: Vector, weights: Vector, gradient: Vector) -> Vector
{
var layerGradient = Vector(repeating: 0, count: activations.count)

weights, 1,
1, vDSP_Length(activations.count),

activationRates, 1,

}

/**
Used to generate update each neurons weights on a per neuron error basis given the input.
*/
public class func updateWeights(learningRate: Float, inputs: Vector, weights: Vector, gradient: Vector) -> Vector
{
var nagativeLearningRate = -learningRate

&nagativeLearningRate,

var scaledInputs = Vector(repeating: 0, count: weights.count)

inputs, 1,
&scaledInputs, 1,
1)

var layerWeights = Vector(repeating: 0, count: weights.count)

scaledInputs, 1,
&layerWeights, 1,
vDSP_Length(weights.count))

return layerWeights
}
}


JPSNeuralNetworkNeuron.swift

//
//  JPSNeuralNetworkNeuron.swift
//
//  Created by Jonathan Sullivan on 4/4/17.
//

import Foundation

public typealias Scalar = Float
public typealias Vector = [Scalar]
public typealias Matrix = [Vector]

public enum JPSNeuralNetworkCostFunction: Int
{
case meanSquared = 0
case crossEntropy = 1

func derivative(OfOutput output: Scalar, targetOutput: Scalar) -> Scalar
{
switch self
{
case .crossEntropy:
return (output - targetOutput) / ((1 - output) * output)

case .meanSquared:
fallthrough

default:
return (output - targetOutput)
}
}

func cost(forOutputs outputs: Vector, targetOutputs: Vector) -> Scalar
{
switch self
{
case .crossEntropy:
return -zip(outputs, targetOutputs).reduce(0, { (sum, pair) -> Scalar in
let temp = pair.1 * log(pair.0)
return sum + temp + (1 - pair.1) * log(1 - pair.0)
})

case .meanSquared:
fallthrough

default:
return 0.5 * zip(outputs, targetOutputs).reduce(0, { (sum, pair) -> Scalar in
return pow(pair.1 - pair.0, 2)
})
}
}
}

public enum JPSNeuralNetworkActivationFunction: Int
{
case sigmoid = 0
case hyperbolicTangent = 1

func derivative(_ activation: Scalar) -> Scalar
{
switch self
{
case .hyperbolicTangent:
return (1 - pow(activation, 2))

case .sigmoid:
fallthrough

default:
return (activation * (1 - activation))
}
}

func activation(_ weightedInput: Scalar) -> Scalar
{
switch self
{
case .hyperbolicTangent:
return tanh(weightedInput)

case .sigmoid:
fallthrough

default:
return (1 / (1 + exp(-weightedInput)))
}
}
}

public class JPSNeuralNetworkNeuron
{
/**
Used to generate a single random weight.
*/
private class func randomWeight(inputCount: Int) -> Scalar
{
let range = (1 / sqrt(Scalar(inputCount)))
let rangeInt = UInt32(2000000 * range)
let randomDouble = Scalar(arc4random_uniform(rangeInt)) - Scalar(rangeInt / 2)
return (randomDouble / 1000000)
}

/**
Used to generate a vector of random weights.
*/
public class func randomWeights(inputCount: Int) -> Vector
{
var weights = Vector()

for _ in 0..<inputCount
{
let weight = JPSNeuralNetworkNeuron.randomWeight(inputCount: inputCount)
weights.append(weight)
}

return weights
}
}


Get this bounty!!!

## What is the difference between linear regression on y with x and x with y?

The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but that’s the case.

The best way to think about this is to imagine a scatter plot of points with y on the vertical axis and x represented by the horizontal axis. Given this framework, you see a cloud of points, which may be vaguely circular, or may be elongated into an ellipse. What you are trying to do in regression is find what might be called the ‘line of best fit’. However, while this seems straightforward, we need to figure out what we mean by ‘best’, and that means we must define what it would be for a line to be good, or for one line to be better than another, etc. Specifically, we must stipulate a loss function. A loss function gives us a way to say how ‘bad’ something is, and thus, when we minimize that, we make our line as ‘good’ as possible, or find the ‘best’ line.

Traditionally, when we conduct a regression analysis, we find estimates of the slope and intercept so as to minimize the sum of squared errors. These are defined as follows:

In terms of our scatter plot, this means we are minimizing the sum of the vertical distances between the observed data points and the line.

On the other hand, it is perfectly reasonable to regress x onto y, but in that case, we would put x on the vertical axis, and so on. If we kept our plot as is (with x on the horizontal axis), regressing x onto y (again, using a slightly adapted version of the above equation with x and y switched) means that we would be minimizing the sum of the horizontal distances between the observed data points and the line. This sounds very similar, but is not quite the same thing. (The way to recognize this is to do it both ways, and then algebraically convert one set of parameter estimates into the terms of the other. Comparing the first model with the rearranged version of the second model, it becomes easy to see that they are not the same.)

Note that neither way would produce the same line we would intuitively draw if someone handed us a piece of graph paper with points plotted on it. In that case, we would draw a line straight through the center, but minimizing the vertical distance yields a line that is slightly flatter (i.e., with a shallower slope), whereas minimizing the horizontal distance yields a line that is slightly steeper.

A correlation is symmetrical x is as correlated with y as y is with x. The Pearson product-moment correlation can be understood within a regression context, however. The correlation coefficient, r, is the slope of the regression line when both variables have been standardized first. That is, you first subtracted off the mean from each observation, and then divided the differences by the standard deviation. The cloud of data points will now be centered on the origin, and the slope would be the same whether you regressed y onto x, or x onto y.

Now, why does this matter? Using our traditional loss function, we are saying that all of the error is in only one of the variables (viz., y). That is, we are saying that x is measured without error and constitutes the set of values we care about, but that y has sampling error. This is very different from saying the converse. This was important in an interesting historical episode: In the late 70’s and early 80’s in the US, the case was made that there was discrimination against women in the workplace, and this was backed up with regression analyses showing that women with equal backgrounds (e.g., qualifications, experience, etc.) were paid, on average, less than men. Critics (or just people who were extra thorough) reasoned that if this was true, women who were paid equally with men would have to be more highly qualified, but when this was checked, it was found that although the results were ‘significant’ when assessed the one way, they were not ‘significant’ when checked the other way, which threw everyone involved into a tizzy. See here for a famous paper that tried to clear the issue up.

The formula for the slope of a simple regression line is a consequence of the loss function that has been adopted. If you are using the standard Ordinary Least Squares loss function (noted above), you can derive the formula for the slope that you see in every intro textbook. This formula can be presented in various forms; one of which I call the ‘intuitive’ formula for the slope. Consider this form for both the situation where you are regressing y on x, and where you are regressing x on y:

Now, I hope it’s obvious that these would not be the same unless Var(xequals Var(y). If the variances are equal (e.g., because you standardized the variables first), then so are the standard deviations, and thus the variances would both also equal SD(x)SD(y). In this case, β^1 would equal Pearson’s r, which is the same either way by virtue of the principle of commutativity:

Source

## HackerRank: BotClean Partially Observable

Problem Statement

The game Bot Clean took place in a fully observable environment, i.e., the state of every cell was visible to the bot at all times. Let us consider a variation of it where the environment is partially observable. The bot has the same actuators and sensors. But the sensors visibility is confined to its 8 adjacent cells.

Input Format
The first line contains two space separated integers which indicate the current position of the bot. The board is indexed using Matrix Convention

5 lines follow, representing the grid. Each cell in the grid is represented by any of the following 4 characters:
‘b’ (ascii value 98) indicates the bot’s current position,
‘d’ (ascii value 100) indicates a dirty cell,
‘-‘ (ascii value 45) indicates a clean cell in the grid, and
‘o’ (ascii value 111) indicates the cell that is currently not visible.

Output Format
Output is the action that is taken by the bot in the current step. It can either be any of the movements in 4 directions or the action of cleaning the cell in which it is currently located. Hence the output formats are LEFT, RIGHT, UP, DOWN or CLEAN.

Sample Input

0 0
b-ooo
-dooo
ooooo
ooooo
ooooo


Sample Output

RIGHT