The basic unit of work in a neural network is the Artificial Neuron
. An Artificial Neuron
has an associated potential to emit a signal. For convenience the value of the potential is kept between Artificial Neuron
as a function activation values
weight values
Artificial Neurons
comes from a branch of mathematics called Tensor Analysis. Consider the implementation of
- We define tensors
$\hat{A} = a_{1...n}$ and$\hat{W} = w_{1...n}$ . - Multiply tensors
$\hat{A}$ and$\hat{W}$ i.e.$\hat{C} = \hat{A} \cdot \hat{W}$ . - The tensor product will be
$\hat{C} = [a_{1}w_{1}, a_{2}w_{2}, a_{3}w_{3}, ..., a_{n}w_{n}]$ . - Reduce
$\hat{C}$ to a scalar value by adding it's components. - The sum of the
$\hat{C}$ components is$S = a_{1}w_{1} + a_{2}w_{2} + a_{3}w_{3}, ..., a_{n-1}w_{n-1} + a_{n}w_{n}$ . -
$S$ is called a weighted sum and is represented by$\sum_{n=1}^{k} a_{n}w_{n}$ where$k$ is the number of elements in$\hat{C}$ . -
$S$ determines the strength of the signal emitted by theArtificial Neuron
. - Capping
$S$ adds additional control over signal emission and is done by subtracting a bias$b$ from the sum. - It is possible for
$S - b$ to have a value outside the desire signal strength$0 \geq p \leq 1$ . For this reason anactivation function
is used to bring$p$ into the desired range. - One of the commonly used
activation functions
is thesigmoid
$\sigma (x) = \frac {\mathrm{1} }{\mathrm{1} + e^{-x} }$ .
In conclusion the implementation of an Artificial Neuron is the function
A neural network
is a computational graph of Artificial Neurons
. Neural networks
are composed of neural network layers
. A neural network layer
is a tensor of Artificial Neurons
. The Artificial Neurons
in a neural network layer
are connected to each other because they are components of a tensor. We can define layer n as $\hat{L}{n} = [P{1}, P_{2}, P_{3}, ... ,P_{n}]$. Neural networks have three layer types input, hidden, and output
. A neural network may have multiple hidden layers but only one input and output layers. Consider a neural network consisting of the fallowing layers:
Neural network themselves are tensors. In this case neural network $\hat{N} = [\hat{L}{i}, \hat{L}{1h}, \hat{L}{o} ]$. Artificial Neurons
in a neural network are associated to each other via function composition
. Consider $P{1i}$ it has an internal tensor of activation values
$\hat{A}{i1} = [a{1i}]$. The number of components in $\hat{A}{i1}$ is one. The output of $P{1i}$ is a potential activation values
in $\hat{L}{1h}$ associated to the number of activation values
in $\hat{L}{i}$? Here is where the magic happens the number of activation values in a layer's Artificial Neurons are determined by the number of Artificial Neurons in the previous layer
.
For completeness let's consider the output layer Artificial Neuron
activation values
$\hat{A}{o1} = [a{1o}, a_{2o}]$ because $\hat{L}{1h}$ has two components $P{1h}$ and
We define a neural network algorithm
as a function output
$\hat{L}{o}$ in response to an input
$\hat{L}{i}$ and n number of hidden layers $\hat{L}{1h..nh}$ i.e. $N(\hat{L}{i}, \hat{L}{1h..nh}) = \hat{L}{o}$. A neural network is a system defined by the following tensor $\hat{N} = [\hat{L}{i}, \hat{L}{1h}, \hat{L}{2h}, \hat{L}{3h},...\hat{L}{nh}, \hat{L}{o} ]$.
In our daily experience we go through time and we have a state
at each moment in time. Our reality is a series of moments in time. At each moment we can assess our state
and map any number of metrics to an exact moment in time and persist the resulting information representing our state
. Our memories are our state
and we derive knowledge from them. Compare to you or me input
and evaluating the output
of every Artificial Neuron
Back propagation is the most widely used machine learning algorithm. The algorithm's objective is to find the optimal values for
- Initialize Artificial Neurons
${P_{n}}$ in$L_{n}$ by assigning random$values$ to every$w$ and$b$ in the range$-1 \leq value \geq 1$ . - Iterate over the training dataset.
- For each item in the dataset
forward propagate
by invoking the activation function on Artificial Neuron$P$ from$L_{1h...nh}$ (all hidden layers) to$L_{o}$ (the output layer) using the input value for each item in the data set as the input. The signals of Artificial Neurons in a previous layer became the input for$\hat W_{n}$ for the current layer. -
Backwards propagate the error
by iterating over the layers in reverse order and calculating the error between the current output$p_{o}$ and the expected output $p'{o}$ the output for the corresponding input in the dataset. One of the most commonly used error, cost, or loss functions to compare $p{o}$ vs. $p'{o}$ is the Mean Squared Error function $E(p'{n}, p_{n}, n) = \frac {1}{n} \sum_{i = 1}^{n}(p'{i} - p{i})^2$. The error indicates how close the signal$p'$ is to$p$ . - Compute the rate of change in the cost function. The rate of change of a single variable function with one scalar output is called a derivative i.e.
$\frac{\mathrm{dy} }{\mathrm{d} x}$ . The rate of change of a multi variable function with one scalar output is called thegradient
i.e$\nabla (E)$ . Thegradient
indicates the direction and magnitude of greatest increase for the error function. In this case the$\nabla (E)$ needs to be computed since we are dealing with multi variable tensors. -
$\nabla (E)$ needs to be negative because the objective is to advance towards lower error or cost i.e.$g = -\nabla (E)$ . Definelearning rate
$l_{r}$ a number$0 \geq r \leq 1$ , used as a factor that determines the magnitude of$\Delta w$ in conjunction with$\nabla (E)$ . Definemomentum
$m$ a number between$0 \geq r \leq 1$ , used as a factor that determines the magnitude of$\Delta w$ in conjunction with$\nabla (E)$ . The magnitude of$l_{r}$ will determine how big of a step we take in our search to minimize the error or cost$E$ . The magnitude of$m$ will determine how much of an influence the previous values of$\Delta w$ have in our search to minimize the error or cost$E$ . Compute the scalar values$\Delta w = l_{r}g + mg\Delta w_{n-1}$ by which$w$ needs change in order to decrease error i.e. bring$p_{o}$ closer$p'_{o}$ . Follow the same procedure to fine tune the bias$b$ . - After iterating over the complete training data set verify that the current error is less or equal to the
error threshold
$E_{t}'$ or that themaximum number of iterations
$I_{max}$ was reached, if true stop training else continue. Each complete iteration over all items in a training set is called anepoch
.
Is crucial to understand that back propagation
while forward propagation
. This means that properly labeled data is essential for training and how well
-
$E_{t}$ error threshold. -
$I_{max}$ expected number of epochs. -
$l_{r}$ learning rate. -
$m$ momentum.
The process of preparing training data sets is challenging. The key to the process is proper vectorization and labeling of training data. Neural networks can be applied to all kind of problems involving regression, classification, or prediction. The way data is prepared for training requires careful consideration of the domain and the goals one intents to achieve.
Imagine we have a set of data representing the horse power min-max feature scaling
is
Neural networks are computational graphs used to universally model functions. The majority of relationships represented by functions are not linear, for this reason logistic functions like
In the examples directory you can find several examples using the Brain.JS framework to create and train neural networks.