This is complementary to works of

Gang Chen : A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation

 

Forward pass

In a supervised learning, for a single input X we have the following two layers neural net, where * corresponds to the the correct output.

 

\[H = \frac{1}{1+\exp{( W^1 X + W^h H^{t-1} + b})}\]

 

 

 

The softmax function \(p = \frac{e^{f^*}}{ \sum_j e^{f_j} }\) is a part of cross entropy which is used as loss function. For the ith training example the loss is:

\[L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right)\]

The loss for a training batch  is:

\[L = \underbrace{\frac{1}{N} \sum_i L_i}_ \text{data loss}  + \underbrace{\frac{1}{2} \lambda \sum_k\sum_l W_{k,1}^2}_\text{regularization loss}\]

 

For the first class the loss is:

\[Loss =-log{(p^1)}= -\log\left(\frac{e^{\hat y^*}}{ e^{\hat y^1}+e^{\hat y^*}+e^{\hat y^3} }\right)\]

 

 Back propagation

For the last weights and biases we have:

and for the first weights and biases we have:

\[\frac{\partial{Loss}}{ \partial{W^1_{2,1}}} =\frac{\partial{Loss}}{ \partial{H^1}}\times \frac{\partial{H^1}}{ \partial{W^1_{2,1}}}\]

References:

[1] http://cs231n.github.io/

[2] http://karpathy.github.io/2015/05/21/rnn-effectiveness/