How backpropagation algorithem works in nerual network?¹

July 06 2016 by wacoder Tags Machine Learning

The backpropagation algorithm was originally introduced in the 1970s, but its importance wasn't fully appreciated until a famous 1986 paper by David Rumelhart et. al. Today, the backpropagation algorithm is the workhorse of neural network. This passage explains backpropagation algorithm mathematically.

Let's begin by with a notation which lets us refe to neural network in an unambiguous way. Our neural netowrk is shown in Figure:

We will use \(\theta\) to denote the weight of nerual network. \(\theta_{jk}^{(l)}\) denote the wight for the connection from the \(k_{th}\) neuron in the \(l\) layer to the \(j_{th}\) neuron in the \(l+1\) layer. And we use \(a_j^l\) for the activation of the \(j_{th}\) neuron in the \(l_{th}\) layer. \(g ()\) is the sigmoid fucntion.

The cost function for neural network (without regularization) is

\begin{equation}\label{Eqn:costfunction} J(\theta) = \frac{1}{m}\sum^m_{i=1}\sum^K_{k=1}[-y^{(i)}_klog{((h_{\theta}(x^{(i)}))_k)}-(1-y^{(i)}_k)log{(1-(h_{\theta}(x^{(i)}))_k)}] \end{equation}

where \(K\) is the possible lables.

Backpropagation is about understanding how changing the weights in network chcanges the cost function. Untimately, this means computing the particial derivative \(\partial J/\partial \theta_{jk}^{l}\). We define the error \(\delta_j^l\) of neuron \(j\) in layer \(l\) by

\begin{equation}\label{Eqn:error} \delta_j^l = \frac{\partial J(\theta)}{\partial z_j^l} \end{equation}

Note that everything in (\ref{Eqn:error}) is easily computed. The exact form of \(\partial J/\partial \theta_{jk}^{l}\), will, of course, depend on the form of the cost function. However, provided the cost function (\ref{Eqn:costfunction}) is known there should be little trouble computing \(\partial J/\partial \theta_{jk}^{l}\). For example, for each output unit \(k\) in layer 3 (the output layer) \(\delta_k^{(3)}=(a_k^{(3)}-y_k)\), for the hidden layer 2 \(\delta^{(2)}=((\theta^{(2)})^T\delta^{(3)}.*g'(z^{(2)})\). Therefore, the partial derivative \(\partial J/\partial \theta^{l}\) is

\begin{equation}\label{Eqn:derivative} \frac{\partial J} {\partial \theta^{l}}= \delta^{(l+1)}(a^{(l)})^T \end{equation}

References

How the backpropagation algorithm works, [online]. ↩

Site Map

How backpropagation algorithem works in nerual network?1

References

How backpropagation algorithem works in nerual network?¹