Skip to content
View VachanVY's full-sized avatar
😁
😁
  • Bengaluru, India
  • 00:55 (UTC +05:30)

Highlights

  • Pro

Block or report VachanVY

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
VachanVY/README.md

Hi, I'm Vachan. I like building and training Deep Neural Networks from the ground up.

Projects:

Transformers

graph TD;
    Transformers -->|Text| GPT;
    Transformers -->|Images| Vision_Transformers["Vision Transformers"];
    Transformers -->|Audio| MAGNeT["MAGNeT"];
    Transformers --> |Video| Video_Vision_Transformers["Video Vision Transformers"];
    Transformers -->|Diffusion| Diffusion_Transformers["Diffusion Transformers"];

    GPT --> Multi_Modal_Transformers["Transfusion
(Multi-Modal Transformer)"];
    Vision_Transformers --> Multi_Modal_Transformers;
    MAGNeT --> Multi_Modal_Transformers;
    Video_Vision_Transformers --> Multi_Modal_Transformers
    Diffusion_Transformers --> Multi_Modal_Transformers

Loading
  • GPT written in jax, trained on tiny shakespeare dataset (1.1 MB text data) and scaled it on the tiny stories dataset (~2 GB text data)
    Model-Params d_model n_heads maximum_context_length num_layers vocab_size Estimated Validation Loss on tiny stories dataset
    280K 64 8 512 5 512 1.33
    15M 288 6 256 6 32000 1.19
    45M 512 8 1024 8 32000 TODO
    110M 768 12 2048 12 32000 TODO
  • Model: 15M | Prompt: Once upon a time, | Sampling Technique: Greedy sampling
    Once upon a time, there was a little girl named Lily. She loved to play with her toys and eat yummy food. One day, she found a big, round thing in her room. It was a microscope. Lily was very curious about it.
    Lily wanted to see what was inside the microscope. She tried to open it, but it was very hard. She tried and tried, but she could not open it. Lily felt sad and wanted to find a way to open the microscope.
    Then, Lily had an idea. She asked her mom for help. Her mom showed her how to open the microscope. Lily was so happy! She looked through the microscope and saw many tiny things. She was so excited to see the tiny things. Lily and her mom had a fun day together.
    
  • Prompt: Once upon a time, in a big forest, there was a fearful little dog named Spot | Sampling Technique: Greedy sampling
    Once upon a time, in a big forest, there was a fearful little dog named Spot. Spot was scared of many things. One day, Spot saw a big tree with a hole in it. He thought, "I want to see what is inside the hole."
    Spot went to the tree and looked inside the hole. He saw a little bird with a hurt wing. Spot said, "I will help you, little bird." He used his paw to gently lift the bird out of the hole. The bird was very happy and said, "Thank you, Spot!"
    Spot and the bird became good friends. They played together in the forest every day. Spot learned that it is good to help others, even if they are scared of something. And they lived happily ever after.
    
  • Video Vision Transformer in PyTorch
  • Test trained on MNIST images by stacking images of the same digit in the time dimension
  • TODO: Scale the model and train it on a proper large dataset...
  • Vision Transformers in jax, trained on MNIST dataset
  • TODO: Scale ViT and train on a larger dataset

Mugen

  • Transfusion is a Multi-Modal Transformer, it can generate text like GPTs and images like Diffusion Models, all at once in one go not separately!
  • It can easily switch between text and image modalities for generations, and it is nothing complicated, just a single transformer with some modality-specific components!
  • This can easily be extended to other modalities like videos, audio, etc, but for now, it can only take images and text as input
  • TODO: Train on a large Multi-Modal Dataset (something like tiny stories dataset with images in between illustrating the story...?)




Pinned Loading

  1. Transfusion.torch Transfusion.torch Public

    PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

    Python 12 4

  2. NeuroForge NeuroForge Public

    Unveiling the Layers: Neural Networks from first principles

    Jupyter Notebook 1

  3. diffusion-transformer diffusion-transformer Public

    Pytorch and JAX Implementation of Scalable Diffusion Models with Transformers | Diffusion Transformers in Pytorch and JAX

    Python 3

  4. gpt.jax gpt.jax Public

    Generative Pretrained Model (GPT) in JAX. A step by step guide to train LLMs on large datasets from scratch

    Python 2 1

  5. Enigma Enigma Public

    Building a Modern Computer from First Principles

    Assembly

  6. ViVIT ViVIT Public

    ViViT: Video Vision Transformer in PyTorch

    Jupyter Notebook 1