Skip to content

VyomAI: state-of-the-art NLP LLM Vision MultiModel transformers implementation into Pytorch

License

Notifications You must be signed in to change notification settings

Ajax0564/VyomAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VyomAI

Transfomer models implementation from scratch using pytorch to make it more accessible for research purposes.

The best way to understand is learning by doing.

Examples

Each example is in one single notebook for readability and understanding

**Each example implemented from scratch using Pytorch 2.0 **

Task dataset link Pyotch 2.0 description
text classification clinc_oos encoder model for text classification
masked language modeling clinc_oos encoder model pretraining with mlm style
electra language modeling clinc_oos encoder model pretraining with electra style
casual language-modeling mark-twain-books decoder model pretraining with gpt style and kv-cache for fast inference
knowledge distilation clinc_oos initilization of a model from pretrained model
seq2seq modeling Flicker-30k seq2seq model training for image caption with kv-cache
adapters clinc_oos Lora and Dora for parameter efficient tunning
vit Scene-classification visual image transformer for image classification
detr Global-Wheat-Detection implementation of detr DEtection TRansformer encoder decoder model for object detection
clip Flicker-30k implementation of contrastive language-image pre-training
vision language multimodel Flicker-30k A minimmal vision-language model implementation with image-text fusion to generate image caption with RoPE and kv-cache
**More to come

Usage

from VyomAI import EncoderModel, EncoderForMaskedLM
from VyomAI import EncoderConfig
config = EncoderConfig()
encoder = EncoderModel(config,pos_embedding_type='rope')
#pos_embedding_type supported: Absolute, sinusoidal, RoPE
#attention_type supported: gqa, Vanila

More About VyomAI

Learn the basics of PyTorch

At a granular level, it support the following components:

Component Description
Encoder Text encoder model with Bert like architecture that support absolute, sin,Rope embedding and GQA , Vanila attention
Decoder Text decoder model with GPT like architecture that support absolute, sin,RoPE embedding and GQA , Vanila attention and KV-Cache for fast inference
Seq2Seq Model with Bart like architecture that support absolute, sin,RoPE embedding and GQA, Vanila attention and KV-Cache for fast inference encoder can be text or image type
VisionEncoder Model with Vit like architecture for image encoding **more to come
Multimodel A Minimal vision-language model **more to come

Improvement and Contribution

We appreciate all contributions. If you want to contribute new features, utility functions, or tutorials please open an issue and discuss the feature with us.

Resources

Some helpfull learning resources

References