Deep Learning Basics [Quick reads]

Prerequisites

Linear Algebra and Calculus
Intro to Probability
Entropy and information
Loss Functions
Maximum Likelihood Estimate

Gradient Descent and Optimization Link1 Link2

Taylor Series Approximation Link
Momentum Link

Backpropagation and Initializations

Xavier/Glorot Link
He init Link
Softmax CrossEntropy Backprop Link
RNN Backprop
Attention Backprop
Activations and differentiations

Unsupervised Pre-training, Fine-tuningLink

Types of Network:

Convolutional Neural Networks Link
LSTMs, GRUs and RNNs
Autoencoder, PCA
GANs Link Link
Transformer and attention Link
Why a deeper network? Link Link2

Regularizations

Dropouts Link
- The core concept of Srivastava el al. (2014) is that “each hidden unit in a neural network trained with dropout must learn to work with a randomly chosen sample of other units. This should make each hidden unit more robust and drive it towards creating useful features on its own without relying on other hidden units to correct its mistakes.”. “In a standard neural network, the derivative received by each parameter tells it how it should change so the final loss function is reduced, given what all other units are doing. Therefore, units may change in a way that they fix up the mistakes of the other units. This may lead to complex co-adaptations. This in turn leads to overfitting because these co-adaptations do not generalize to unseen data.” Srivastava et al. (2014) hypothesize that by making the presence of other hidden units unreliable, dropout prevents co-adaptation of each hidden unit.
L1
L2

Common Problems and Intuitions

Best practice Andrej
Vanishing Gradients Link Link2
Exploding Gradients Link
Batch Normalization Link
- Covariate shift in inputs Link
- Faster convergence: Then during gradient descent, in order to “move the needle” for the Loss, the network would have to make a large update to one weight compared to the other weight. This can cause the gradient descent trajectory to oscillate back and forth along one dimension, thus taking more steps to reach the minimum.
- Vanishing exploding gradients
Skip Connections, ResNets Link ResNet
- skip connections allow the gradient to reach beginning weights with greater magnitude by skipping some layers in between.
Modal Collapse in GANs Link
Data Augment Stanford

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DL.md

DL.md

Deep Learning Basics [Quick reads]

Prerequisites

Gradient Descent and Optimization Link1 Link2

Backpropagation and Initializations

Unsupervised Pre-training, Fine-tuningLink

Types of Network:

Regularizations

Common Problems and Intuitions

Deep Learning General

https://www.deeplearningbook.org/

Practical Deep Learning Link

Deep Learning Optimization Link

Structuring Your Tensorflow Models Link

AUTODIFF Link JAX

Self-supervised Learning Link

Files

DL.md

Latest commit

History

DL.md

File metadata and controls

Deep Learning Basics [Quick reads]

Prerequisites

Gradient Descent and Optimization Link1 Link2

Backpropagation and Initializations

Unsupervised Pre-training, Fine-tuningLink

Types of Network:

Regularizations

Common Problems and Intuitions

Deep Learning General

https://www.deeplearningbook.org/

Practical Deep Learning Link

Deep Learning Optimization Link

Structuring Your Tensorflow Models Link

AUTODIFF Link JAX

Self-supervised Learning Link