Vachan V Y VachanVY

Hi, I'm Vachan!

Deep Learning and Systems Programming

Projects:

NeuroForge:

Implemented Neural Network (Forward and Backward Propagation), Batchnorm and Layernorm, Dropout from scratch just using basic tensor methods
Neural Networks => nn.ipynb
- Logistic Regression
- MLP
Batch-Normalization and Layer-Normalization: Why When Where & How? => batchnorm.ipynb, layernorm.ipynb
Dropout: Why When Where & How? => dropout.ipynb, dropout_scale.ipynb
- Comparision before and after scaling the model => dropout_scale.ipynb, nn_scale.ipynb
Adam and AdamW
- Adam
- AdamW

Transformers

graph TD;
    Transformers -->|Text| GPT;
    Transformers -->|Images| Vision_Transformers["Vision Transformers"];
    Transformers -->|Audio| MAGNeT["MAGNeT"];
    Transformers --> |Video| Video_Vision_Transformers["Video Vision Transformers"];
    Transformers -->|Diffusion| Diffusion_Transformers["Diffusion Transformers"];

    GPT --> Multi_Modal_Transformers["Multi-Modal Transformer (Transfusion)"];
    Vision_Transformers --> Multi_Modal_Transformers;
    MAGNeT --> Multi_Modal_Transformers;
    Video_Vision_Transformers --> Multi_Modal_Transformers;
    Diffusion_Transformers --> Multi_Modal_Transformers;

    Multi_Modal_Transformers --> LLMs["Large Language Models (LLMs)"];
    RLHF["Reinforcement Learning from Human Feedback (RLHF)"] --> LLMs;

    Reinforcement_Learning --> RLHF;

    LLMs --> Agentic_LLMs["Agentic LLMs"];
    Reinforcement_Learning --> Agentic_LLMs;

gpt.jax:

GPT written in jax, trained on tiny shakespeare dataset (1.1 MB text data) and scaled it on the tiny stories dataset (~2 GB text data)

Model-Params	`d_model`	`n_heads`	`maximum_context_length`	`num_layers`	`vocab_size`	Estimated Validation Loss on tiny stories dataset
280K	64	8	512	5	512	1.33
15M	288	6	256	6	32000	1.19
45M	512	8	1024	8	32000	TODO
110M	768	12	2048	12	32000	TODO

Model: 15M | Prompt: Once upon a time, | Sampling Technique: Greedy sampling

Once upon a time, there was a little girl named Lily. She loved to play with her toys and eat yummy food. One day, she found a big, round thing in her room. It was a microscope. Lily was very curious about it.
Lily wanted to see what was inside the microscope. She tried to open it, but it was very hard. She tried and tried, but she could not open it. Lily felt sad and wanted to find a way to open the microscope.
Then, Lily had an idea. She asked her mom for help. Her mom showed her how to open the microscope. Lily was so happy! She looked through the microscope and saw many tiny things. She was so excited to see the tiny things. Lily and her mom had a fun day together.

Prompt: Once upon a time, in a big forest, there was a fearful little dog named Spot | Sampling Technique: Greedy sampling

Once upon a time, in a big forest, there was a fearful little dog named Spot. Spot was scared of many things. One day, Spot saw a big tree with a hole in it. He thought, "I want to see what is inside the hole."
Spot went to the tree and looked inside the hole. He saw a little bird with a hurt wing. Spot said, "I will help you, little bird." He used his paw to gently lift the bird out of the hole. The bird was very happy and said, "Thank you, Spot!"
Spot and the bird became good friends. They played together in the forest every day. Spot learned that it is good to help others, even if they are scared of something. And they lived happily ever after.

Diffusion Transformers

CelebA
- Generated-images <====== See the Model Generated Images here
- Training-insights
MNIST-experiment
- Training on MNIST
Diffusion-Transformers Paper Summary
Some generated images:

Reinforcement-Learning

Below links don't redirect anywhere, gotta refactor the code and add links, for now go to the repo directly👆

Reinforcement Learning: An Introduction by Andrew Barto and Richard S. Sutton

Dynamic Programming
- Policy Iteration - Policy Evaluation & Policy Iteration
- Value Iteration
Monte-Carlo Methods
- Monte Carlo Exploring Starts
Temporal-Difference (Tabular)
n-step Bootstrapping
Planning and Learning with Tabular Methods
On-policy Prediction with Approximation
- Covered in Papers Section, where we use function approximators like Neural Networks for RL
On-policy Control with Approximation
Off-policy Methods with Approximation
Eligibility Traces
Policy Gradient Methods
- Monte-Carlo Policy-Gradient
- REINFORCE with Baseline
- One-Step Actor-Critic
- Policy Gradient on Continuous Actions

Reinforcement Learning: Paper Implementations

2013: Playing Atari with Deep Reinforcement Learning
Prioritized DDQN || 2015: Deep Reinforcement Learning with Double Q-learning + 2016 Prioritized Experience Replay || (TODO)
2017: Proximal Policy Optimization (PPO)
2014: Deterministic Policy Gradient
2018: Soft Actor-Critic
AlphaGo, AlphaZero, AlphaFold, etc:
- 2017: Mastering the game of go without human knowledge
- 2017: AlphaZero
- 2020: Mastering Atari, Go, chess and shogi by planning with a learned model
- 20xx: AlphaFold
(many more to be added...)

Transfusion (A Multi-Modal Transformer)

Transfusion is a Multi-Modal Transformer, it can generate text like GPTs and images like Diffusion Models, all at once in one go not separately!
It can easily switch between text and image modalities for generations, and it is nothing complicated, just a single transformer with some modality-specific components!
This can easily be extended to other modalities like videos, audio, etc, but for now, it can only take images and text as input
TODO: Train on a large Multi-Modal Dataset (something like tiny stories dataset with images in between illustrating the story...?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly