How backpropagation algorithem works in nerual network?1

July 06 2016 by wacoder Tags Machine Learning

The backpropagation algorithm was originally introduced in the 1970s, but its importance wasn't fully appreciated until a famous 1986 paper by David Rumelhart et. al. Today, the backpropagation algorithm is the workhorse of neural network. This passage explains backpropagation algorithm mathematically.

Let's begin by with a notation which lets us refe to neural network in an unambiguous way. Our neural netowrk is shown in Figure:

We will use \(\theta\) to denote the weight of nerual network. \(\theta_{jk}^{(l)}\) denote the wight for the connection from the \(k_{th}\) neuron in the \(l\) layer to the \(j_{th}\) neuron in the \(l+1\) layer. And we use \(a_j^l\) for the activation of the \(j_{th}\) neuron in the \(l_{th}\) layer. \(g ()\) is the sigmoid fucntion.

The cost function for neural network (without regularization) is

\begin{equation}\label{Eqn:costfunction} J(\theta) = \frac{1}{m}\sum^m_{i=1}\sum^K_{k=1}[-y^{(i)}_klog{((h_{\theta}(x^{(i)}))_k)}-(1-y^{(i)}_k)log{(1-(h_{\theta}(x^{(i)}))_k)}] \end{equation}
where \(K\) is the possible lables.

Backpropagation is about understanding how changing the weights in network chcanges the cost function. Untimately, this means computing the particial derivative \(\partial J/\partial \theta_{jk}^{l}\). We define the error \(\delta_j^l\) of neuron \(j\) in layer \(l\) by

\begin{equation}\label{Eqn:error} \delta_j^l = \frac{\partial J(\theta)}{\partial z_j^l} \end{equation}

Note that everything in (\ref{Eqn:error}) is easily computed. The exact form of \(\partial J/\partial \theta_{jk}^{l}\), will, of course, depend on the form of the cost function. However, provided the cost function (\ref{Eqn:costfunction}) is known there should be little trouble computing \(\partial J/\partial \theta_{jk}^{l}\). For example, for each output unit \(k\) in layer 3 (the output layer) \(\delta_k^{(3)}=(a_k^{(3)}-y_k)\), for the hidden layer 2 \(\delta^{(2)}=((\theta^{(2)})^T\delta^{(3)}.*g'(z^{(2)})\). Therefore, the partial derivative \(\partial J/\partial \theta^{l}\) is

\begin{equation}\label{Eqn:derivative} \frac{\partial J} {\partial \theta^{l}}= \delta^{(l+1)}(a^{(l)})^T \end{equation}

References


  1. How the backpropagation algorithm works, [online]