Thoughts on GitHub Models?
BetaGive feedback
Phi-3.5-vision instruct (128k)
Refresh of Phi-3-vision model.
Context
131k input · 4k output
Training date
Aug 2024
Rate limit tier
Provider support
Try Phi-3.5-vision instruct (128k)
Get early access to our playground for modelsJoin our limited beta waiting list today and be among the first to try out an easy way to test models
Model navigation navigation
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
🏡 Phi-3 Portal
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
👩🍳 Phi-3 Cookbook
Architecture | Phi-3.5-vision has 4.2B parameters and contains image encoder, connector, projector, and Phi-3 Mini language model. |
Inputs | Text and Image. It’s best suited for prompts using the chat format. |
Context length | 128K tokens |
GPUs | 256 A100-80G |
Training time | 6 days |
Training data | 500B tokens (vision tokens + text tokens) |
Outputs | Generated text in response to the input |
Dates | Trained between July and August 2024 |
Status | This is a static model trained on an offline text dataset with cutoff date March 15, 2024. Future versions of the tuned models may be released as we improve models. |
Release date | August 20, 2024 |
License | MIT |
Languages
(1)English
About
Context
131k input · 4k output
Training date
Aug 2024
Rate limit tier
Provider support
Languages
(1)English