On the difficulty of training recurrent neural networks - 'gradient norm clipping strategy' explained
Introduction to LSTM and GRU - resonable introduction
Deep Recurrent Q-Learning for Partially Observable MDPs - DQN with LSTM
[Visualizing Higher-Layer Features of a Deep Network] (http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/247)
Insight into training neural network - among others: gradient descent methods