Applied Deep Learning (YouTube Playlist)

Course Objectives & Prerequisites:

This is a two-semester-long course primarily designed for graduate students. However, undergraduate students with demonstrated strong backgrounds in probability, statistics (e.g., linear & logistic regressions), numerical linear algebra and optimization are also welcome to register. We will be pursuing the objective of familiarizing the students with state-of-the-art deep learning techniques employed in the industry. Deep learning is a field that has been witnessing a mini-revolution every few months. It is therefore very important that the students registering for this course are eager to learn new concepts. So much of deep learning is just software engineering. Consequently, the students should be able to write clean code while doing their assignments. Python will be the programming language used in this course. Familiarity with TensorFlow and PyTorch is a plus but is not a requirement. However, it is very important that the students are willing to do the hard work to learn and use these two frameworks as the course progresses.

Part I Topics (Fall Semester)

Training Deep Neural Networks (Lecture Notes) (YouTube Playlist)
Computer Vision
- Image Classification
  - Large Networks (Lecture Notes) (YouTube Playlist)
  - Small Networks (Lecture Notes) (YouTube Playlist)
  - AutoML (Lecture Notes) (YouTube Playlist)
  - Robustness (Lecture Notes) (YouTube Playlist)
  - Visualizing & Understanding (Lecture Notes) (YouTube Playlist)
  - Transfer Learning (Lecture Notes) (YouTube Playlist)
  - Domain Adaptation (Lecture Notes)
  - Few Shot Learning (Lecture Notes)
  - Federated Learning (Lecture Notes)
  - Self-training & Contrastive Learning (Lecture Notes)
- Image Transformation
  - Semantic Segmentation (Lecture Notes) (YouTube Playlist)
  - Super-Resolution, Denoising, and Colorization (Lecture Notes) (YouTube Playlist)
  - Pose Estimation (Lecture Notes)
  - Optical Flow and Depth Estimation (Lecture Notes)
- Object Detection
  - Two Stage Detectors (Lecture Notes) (YouTube Playlist)
  - One Stage Detectors (Lecture Notes) (YouTube Playlist)
- Face Recognition and Detection (Lecture Notes)
- Video (Lecture Notes) (YouTube Playlist)
- 3D (Lecture Notes) (YouTube Playlist)

Part II Topics (Spring Semester)

Natural Language Processing
- Word Representations (Lecture Notes) (YouTube Playlist)
- Text Classification (Lecture Notes) (YouTube Playlist)
- Neural Machine Translation (Lecture Notes) (YouTube Playlist)
- Language Modeling (Lecture Notes) (YouTube Playlist)
Multimodal Learning (Lecture Notes) (YouTube Playlist)
Generative Networks (Lecture Notes) (YouTube Playlist)
Speech & Music (Lecture Notes) (YouTube Playlist)
Reinforcement Learning (Lecture Notes) (YouTube Playlist)
Graph Neural Networks (Lecture Notes) (YouTube Playlist)
Recommender Systems (Lecture Notes)

References

Training Deep Neural Networks

An overview of gradient descent optimization algorithms

Computer Vision; Image Classification; Large Networks

Multi-column Deep Neural Networks for Image Classification
ImageNet Classification with Deep Convolutional Neural Networks (code)
Dropout: A Simple Way to Prevent Neural Networks from Overfitting (code)
Network In Network
Very Deep Convolutional Networks for Large-Scale Image Recognition (code)
Going Deeper with Convolutions
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Rethinking the Inception Architecture for Computer Vision
Training Very Deep Networks
Deep Residual Learning for Image Recognition (code)
Identity Mappings in Deep Residual Networks (code)
Wide Residual Networks (code)
Aggregated Residual Transformations for Deep Neural Networks (code)
Densely Connected Convolutional Networks (code)
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
mixup: Beyond Empirical Risk Minimization (code)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (code)
Squeeze-and-Excitation Networks (code)
CBAM: Convolutional Block Attention Module (code)
Random Erasing Data Augmentation (code)
Spatial Transformer Networks
Dynamic Routing Between Capsules
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (code)
MLP-Mixer: An all-MLP Architecture for Vision (code)
High-Performance Large-Scale Image Recognition Without Normalization (code)

Computer Vision; Image Classification; Small Networks

Distilling the Knowledge in a Neural Network
Learning both Weights and Connections for Efficient Neural Networks
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (code)
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (code)
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (code)
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (code)
Xception: Deep Learning with Depthwise Separable Convolutions (code)
MobileNetV2: Inverted Residuals and Linear Bottlenecks (code)
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (code)

Computer Vision; Image Classification; AutoML

Neural Architecture Search With Reinforcement Learning (code)
Learning Transferable Architectures for Scalable Image Recognition
Regularized Evolution for Image Classifier Architecture Search (code)
Evolving Deep Neural Networks
Efficient Neural Architecture Search via Parameter Sharing (code)
DARTS: Differentiable Architecture Search (code)
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (code)
MnasNet: Platform-Aware Neural Architecture Search for Mobile (code)

Computer Vision; Image Classification; Robustness

Intriguing properties of neural networks
Explaining and harnessing adversarial examples
Adversarial Examples in the Physical World
The Limitations of Deep Learning in Adversarial Settings
Practical Black-Box Attacks against Machine Learning
Towards Evaluating the Robustness of Neural Networks (code)
Towards Deep Learning Models Resistant to Adversarial Attacks (code)
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (code)
One Pixel Attack for Fooling Deep Neural Networks

Computer Vision; Image Classification; Visualizing & Understanding

Visualizing and Understanding Convolutional Networks
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Striving for Simplicity: The All Convolutional Net
“Why Should I Trust You?” Explaining the Predictions of Any Classifier (code)
Learning Deep Features for Discriminative Localization (code)
Understanding Deep Learning Requires Rethinking Generalization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (code)
A Unified Approach to Interpreting Model Predictions (code)

Computer Vision; Image Classification; Transfer Learning

How transferable are features in deep neural networks? (code)
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (code)
CNN Features off-the-shelf: an Astounding Baseline for Recognition
Return of the Devil in the Details: Delving Deep into Convolutional Nets (code)
Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks (code)

Computer Vision; Image Classification; Domain Adaptation

Domain-Adversarial Training of Neural Networks (code)
Adversarial Discriminative Domain Adaptation

Computer Vision; Image Classification; Few-shot Learning

Matching Networks for One Shot Learning
Prototypical Networks for Few-shot Learning (code)
Learning to Compare: Relation Network for Few-Shot Learning

Computer Vision; Image Classification; Federated Learning

Communication-Efficient Learning of Deep Networks from Decentralized Data

Computer Vision; Image Classification; Self-training & Contrastive Learning

Self-training with Noisy Student improves ImageNet classification (code)
A Simple Framework for Contrastive Learning of Visual Representations (code)
Momentum Contrast for Unsupervised Visual Representation Learning (code)

Computer Vision; Image Transformation; Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation (code)
Learning Deconvolution Network for Semantic Segmentation (code)
U-Net: Convolutional Networks for Biomedical Image Segmentation (code)
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (code)
Multi-scale Context Aggregation by Dilated Convolutions (code)
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Pyramid Scene Parsing Network (code)
Rethinking Atrous Convolution for Semantic Image Segmentation
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation (code)
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (code)
Dual Attention Network for Scene Segmentation (code)

Computer Vision; Image Transformation; Super-Resolution, Denoising, and Colorization

Learning a Deep Convolutional Network for Image Super-Resolution (code)
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Image Style Transfer Using Convolutional Neural Networks (code)
Accurate Image Super-Resolution Using Very Deep Convolutional Networks (code)
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (code)
Enhanced Deep Residual Networks for Single Image Super-Resolution (code)
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (code)

Computer Vision; Pose Estimation

Convolutional Pose Machines (code)
Stacked Hourglass Networks for Human Pose Estimation (code)
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (code)

Computer Vision; Image Transformation; Optical Flow and Depth Estimation

FlowNet: Learning Optical Flow with Convolutional Networks
FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks (code)

Computer Vision; Object Detection; Two Stage Detectors

A Survey on Performance Metrics for Object-Detection Algorithms (code)
Rich feature hierarchies for accurate object detection and semantic segmentation (code)
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Fast R-CNN (code)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (code)
R-FCN: Object Detection via Region-based Fully Convolutional Networks (code)
Feature Pyramid Networks for Object Detection
Deformable Convolutional Networks (code)
Mask R-CNN (code)

Computer Vision; Object Detection; One Stage Detectors

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks (code)
You Only Look Once: Unified, Real-Time Object Detection (code)
SSD: Single Shot MultiBox Detector (code)
YOLO9000: Better, Faster, Stronger (code)
Focal Loss for Dense Object Detection
Speed/Accuracy Trade-Offs For Modern Convolutional Object Detectors
YOLOv3: An Incremental Improvement (code)
End-to-End Object Detection with Transformers (code)

Computer Vision; Face Recognition and Detection

DeepFace: Closing the Gap to Human-Level Performance in Face Verification
FaceNet: A Unified Embedding for Face Recognition and Clustering
Deep Face Recognition
Deep Learning Face Attributes in the Wild
Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks (code)
A Discriminative Feature Learning Approach for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face Recognition (code)

Computer Vision; Video

3D Convolutional Neural Networks for Human Action Recognition
Large-scale Video Classification with Convolutional Neural Networks (code)
Two-Stream Convolutional Networks for Action Recognition in Videos
Learning Spatiotemporal Features with 3D Convolutional Networks (code)
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors (code)
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (code)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (code)
Non-local Neural Networks (code)
Group Normalization (code)
Fully-Convolutional Siamese Networks for Object Tracking (code)
Robust Consistent Video Depth Estimation (code)

Computer Vision; 3D

V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation (code)
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (code)
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (code)
Dynamic Graph CNN for Learning on Point Clouds (code)

Natural Language Processing; Word Representations

Linguistic Regularities in Continuous Space Word Representations
Distributed Representations of Words and Phrases and their Compositionality
Efficient Estimation of Word Representations in Vector Space (code)
GloVe: Global Vectors for Word Representation (code)
Enriching Word Vectors with Subword Information (code)

Natural Language Processing; Text Classification

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (code)
Convolutional Neural Networks for Sentence Classification (code)
Distributed Representations of Sentences and Documents
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks (code)
A Convolutional Neural Network for Modelling Sentences
A Sensitivity Analysis Of (And Practitioners' Guide To) Convolutional Neural Networks For Sentence Classification
Character-level Convolutional Networks for Text Classification (code)
Bag Of Tricks For Efficient Text Classification (code)
Hierarchical Attention Networks for Document Classification
Neural Architectures For Named Entity Recognition (code) (code)
Universal Language Model Fine-tuning for Text Classification (code)

Natural Language Processing; Neural Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate
Sequence to Sequence Learning with Neural Networks
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
Effective Approaches to Attention-based Neural Machine Translation (code)
Neural Machine Translation Of Rare Words With Subword Units (code)
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Convolutional Sequence to Sequence Learning (code)
Attention Is All You Need (code)
Reformer: The Efficient Transformer (code)
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (code)

Natural Language Processing; Language Modeling

Deep contextualized word representations (code)
Improving Language Understanding by Generative Pre-Training (code)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (code)
Language Models are Unsupervised Multitask Learners (code)
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (code)
RoBERTa: A Robustly Optimized BERT Pretraining Approach (code)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (code)
XLNet: Generalized Autoregressive Pretraining for Language Understanding (code)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (code)
Cross-lingual Language Model Pretraining (code)
Language Models are Few-Shot Learners (code)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (code)
Pay Attention to MLPs

Multimodal Learning

Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Show and Tell: A Neural Image Caption Generator
Deep Visual-Semantic Alignments for Generating Image Descriptions
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (code)
Layer Normalization
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (code)
Zero-Shot Text-to-Image Generation (code)

Generative Networks

Auto-Encoding Variational Bayes
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Generative Adversarial Nets (code)
Conditional Generative Adversarial Nets
Unsupervised representation learning with deep convolutional generative adversarial networks (code)
Improved Techniques for Training GANs (code)
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (code)
Context Encoders: Feature Learning by Inpainting (code)
Least Squares Generative Adversarial Networks (code)
Image-to-Image Translation with Conditional Adversarial Networks (code)
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (code)
Wasserstein GAN (code)
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Improved Training of Wasserstein GANs (code)
Progressive growing of GANs for improved quality, stability, and variation (code)
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (code)
Spectral Normalization for Generative Adversarial Networks (code)
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (code)
Large Scale GAN Training for High Fidelity Natural Image Synthesis (code)
A Style-Based Generator Architecture for Generative Adversarial Networks (code)
Self-Attention Generative Adversarial Networks (code)
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (code)
Analyzing and Improving the Image Quality of StyleGAN (code)

Speech & Music

Mel-Spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs)
Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
Speech Recognition with Deep Recurrent Neural Networks
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (code)
Towards End-to-End Speech Recognition with Recurrent Neural Networks
Deep Speech: Scaling up end-to-end speech recognition
WaveNet: A Generative Model for Raw Audio
LSTM: A Search Space Odyssey
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Jasper: An End-to-End Convolutional Neural Acoustic Model (code)
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (code)

Reinforcement Learning

Playing Atari with Deep Reinforcement Learning
Human-level Control through Deep Reinforcement Learning
Continuous Control with Deep Reinforcement Learning
Trust Region Policy Optimization (code)
Conjugate Gradient Method
Mastering the game of Go with deep neural networks and tree search
Asynchronous Methods for Deep Reinforcement Learning
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (code)
Deep Reinforcement Learning with Double Q-Learning
End to End Learning for Self-Driving Cars
End-To-End Training Of Deep Visuomotor Policies
Mastering the game of Go without human knowledge
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
Proximal Policy Optimization Algorithms
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (code) (code)
Overcoming catastrophic forgetting in neural networks
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (code)

Graph Neural Networks

DeepWalk: Online Learning of Social Representations (code)
LINE: Large-scale Information Network Embedding (code)
node2vec: Scalable Feature Learning for Networks (code)
Semi-Supervised Classification with Graph Convolutional Networks (code)
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (code)
Inductive Representation Learning on Large Graphs (code)
Graph Attention Networks (code)
How Powerful Are Graph Neural Networks? (code)

Recommender Systems

Session-based Recommendations with Recurrent Neural Networks (code)
AutoRec: Autoencoders Meet Collaborative Filtering
Wide & Deep Learning for Recommender Systems
Neural Collaborative Filtering (code)
Neural Factorization Machines for Sparse Predictive Analytics (code)
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks (code)
Variational Autoencoders for Collaborative Filtering (code)
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (code)
Deep Learning Recommendation Model for Personalization and Recommendation Systems (code)

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflow/Config_Files		.github/workflow/Config_Files
01 - Computer Vision		01 - Computer Vision
02 - Natural Language Processing		02 - Natural Language Processing
00 - Training.pdf		00 - Training.pdf
03 - Multimodal Learning.pdf		03 - Multimodal Learning.pdf
04 - Generative Networks.pdf		04 - Generative Networks.pdf
05 - Speech & Music.pdf		05 - Speech & Music.pdf
06 - Reinforcement Learning.pdf		06 - Reinforcement Learning.pdf
07 - Graph Neural Networks.pdf		07 - Graph Neural Networks.pdf
08 - Recommender Systems.pdf		08 - Recommender Systems.pdf
README.md		README.md
SECURITY.md		SECURITY.md

Andrea-MariaDB/Applied-Deep-Learning

Folders and files

Latest commit

History

Repository files navigation