You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,14 @@ This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offe
4
4
5
5
## Key Features
6
6
7
-
***Su-scaled RoPE:**Implements Su-scaled Rotary Position Embeddings to manage sequences of up to 128K tokens.
7
+
***VLM Agent:**Leverages VLM's visual understanding for interactive code generation and refinement, enabling data visualization and image manipulation through a visual feedback loop. (WIP)
8
8
***Batch Generation:** Accelerate inference by generating text for multiple prompts concurrently (107 tokens-per-sec batched vs 56 tokens-per-sec original)
9
9
***Cache Quantization:** Optimize inference for processing long contexts with key-value cache quantization (5.3s quantized vs 5.1s original).
10
10
***Model Quantization:** Reduce model size for faster loading and deployment (2.3GB quantized vs 8.5GB original).
11
+
***Su-scaled RoPE:** Implements Su-scaled Rotary Position Embeddings to manage sequences of up to 128K tokens.
11
12
***Chat Template:** Utilization of chat template for streamlining interactions with the model.
12
13
***LoRA Training:** Easily customize the model for specific tasks or datasets using LoRA.
13
14
***Benchmarking:** To quickly assess model performance on any dataset. (WIP)
14
-
***VLM Agent:** Leverages VLM's visual understanding for interactive code generation and refinement, enabling data visualization and image manipulation through a visual feedback loop. (WIP)
15
15
***Long Context RAG:** Enables the integration of Retrieval-Augmented Generation to harness large amounts of external knowledge for complex tasks such as code understanding, leveraging the phi-3-vision model's 128K context window. (WIP)
0 commit comments