- Linear Algebra and Calculus
- Intro to Probability
- Entropy and information
- Loss Functions
- Maximum Likelihood Estimate
- Xavier/Glorot Link
- He init Link
- Softmax CrossEntropy Backprop Link
- RNN Backprop
- Attention Backprop
- Activations and differentiations
Unsupervised Pre-training, Fine-tuningLink
- Convolutional Neural Networks Link
- LSTMs, GRUs and RNNs
- Autoencoder, PCA
- GANs Link Link
- Transformer and attention Link
- Why a deeper network? Link Link2
-
Dropouts Link
- The core concept of Srivastava el al. (2014) is that “each hidden unit in a neural network trained with dropout must learn to work with a randomly chosen sample of other units. This should make each hidden unit more robust and drive it towards creating useful features on its own without relying on other hidden units to correct its mistakes.”. “In a standard neural network, the derivative received by each parameter tells it how it should change so the final loss function is reduced, given what all other units are doing. Therefore, units may change in a way that they fix up the mistakes of the other units. This may lead to complex co-adaptations. This in turn leads to overfitting because these co-adaptations do not generalize to unseen data.” Srivastava et al. (2014) hypothesize that by making the presence of other hidden units unreliable, dropout prevents co-adaptation of each hidden unit.
-
L1
-
L2
- Best practice Andrej
- Vanishing Gradients Link Link2
- Exploding Gradients Link
- Batch Normalization Link
- Covariate shift in inputs Link
- Faster convergence: Then during gradient descent, in order to “move the needle” for the Loss, the network would have to make a large update to one weight compared to the other weight. This can cause the gradient descent trajectory to oscillate back and forth along one dimension, thus taking more steps to reach the minimum.
- Vanishing exploding gradients
- Skip Connections, ResNets Link ResNet
- skip connections allow the gradient to reach beginning weights with greater magnitude by skipping some layers in between.
- Modal Collapse in GANs Link
- Data Augment Stanford