# Learning rule demonstration

### Hypothesys

This page demonstrates the learning rule for updating weights in a single layer artificial neural network.
Since the learning rule is the same for each perceptron, we will focus on a single one.
In this demonstration, we will assume we want to update the weights with respect to the
gradient descent algorithm.

### Transfert function

Let's consider the following perceptron:

The transfert function is given by:
\begin{equation}
y= f(w_1.x_1 + w_2.x_2 + ... + w_N.x_N) = f(\sum\limits_{i=1}^N w_i.x_i)
\label{eq:transfert-function}
\end{equation}
Let's define the sum \( S \):
\begin{equation}
S(w_i,x_i)= \sum\limits_{i=1}^N w_i.x_i
\label{eq:sum}
\end{equation}
Let's rewrite \(y\) as a function of \( S \) by merging equations \eqref{eq:sum} and \eqref{eq:transfert-function}:
$$ y(S)= f(\sum\limits_{i=1}^N w_i.x_i)=f(S(w_i,x_i)) $$

### Error (or loss)

In artificial neural networks, the error we want to minimize is:
$$ E=(y'-y)^2 $$
with:

- \(E\) the error
- \(y'\) the expected output (from training data set)
- \(y\) the real output of the network (from network)

In practice and to simplify the maths, this error is divided by two:
$$ E=\frac{1}{2}(y'-y)^2 $$

### Gradient descent

The algorithm (gradient descent) used to train the network (i.e. updating the weights) is given by:
\begin{equation}
w_i'=w_i-\eta.\frac{dE}{dw_i}
\label{eq:gradient-descent}
\end{equation}
where:

- \(w_i\) the weight before update
- \(w_i'\) the weight after update
- \(\eta\) the learning rate

### Derivating the error

Let's derivate the error:
\begin{equation}
\frac{dE}{dw_i} = \frac{1}{2}\frac{d}{dw_i}(y'-y)^2
\label{eq:error}
\end{equation}
Thanks to the

chain rule
$$ (f \circ g)'=(f' \circ g).g') $$
the equation \eqref{eq:error} can be rewritten:
$$ \frac{dE}{dw_i} = \frac{2}{2}(y'-y)\frac{d}{dw_i} (y'-y) = -(y'-y)\frac{dy}{dw_i} $$
Let's now calculate the derivative of \(y\):
\begin{equation}
\frac{dy}{dw_i} = \frac{df(S(w_i,x_i))}{dw_i}
\label{eq:dy-dwi}
\end{equation}
Once again, we use the

chain rule
to rewrite equation \eqref{eq:dy-dwi} :
$$ \frac{df(S)}{dw_i} = \frac{df(S)}{dS}\frac{dS}{dw_i} = x_i\frac{df(S)}{dS} $$
The derivative of the error becomes:
\begin{equation}
\frac{dE}{dw_i} = -x_i(y'-y)\frac{df(S)}{dS}
\label{eq:derror}
\end{equation}

### Updating the weights

By merging equations \eqref{eq:gradient-descent} and \eqref{eq:derror} the weights can be updated with the following formula:
$$ w_i'=w_i-\eta.\frac{dE}{dw_i} = w_i + \eta. x_i.(y'-y).\frac{df(S)}{dS} $$
In conclusion :
$$ w_i'= w_i + \eta.x_i.(y'-y).\frac{df(S)}{dS} $$