Skip to content

Commit 220cd4f

Browse files
authored
Phi-3-Mini-128K: blind_model=True
1 parent 085b29b commit 220cd4f

File tree

3 files changed

+284
-149
lines changed

3 files changed

+284
-149
lines changed

README.md

Lines changed: 95 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,110 @@
1-
# Phi-3-Vision for Apple MLX
1+
# Phi-3-MLX: Language and Vision Models for Apple Silicon
22

3-
Phi-3-Vision for Apple MLX is a powerful and flexible AI agent framework that leverages the Phi-3-Vision model to perform a wide range of tasks, from visual question answering to code generation and execution. This project aims to provide an easy-to-use interface for interacting with the Phi-3-Vision model, while also offering advanced features like custom toolchains and model quantization.
3+
Phi-3-MLX is a versatile AI framework that leverages both the Phi-3-Vision multimodal model and the recently updated ([July 2, 2024](https://x.com/reach_vb/status/1808056108319179012)) Phi-3-Mini-128K language model, optimized for Apple Silicon using the MLX framework. This project provides an easy-to-use interface for a wide range of AI tasks, from advanced text generation to visual question answering and code execution.
44

5-
Phi-3-Vision is a state-of-the-art vision-language model that excels in understanding and generating content based on both textual and visual inputs. By integrating this model with Apple's MLX framework, we provide a high-performance solution optimized for Apple silicon.
5+
## Recent Updates: Phi-3 Mini Improvements
66

7-
## Quick Start
7+
Microsoft has recently released significant updates to the Phi-3 Mini model, dramatically improving its capabilities:
8+
9+
- Substantially enhanced code understanding in Python, C++, Rust, and TypeScript
10+
- Improved post-training for better-structured output
11+
- Enhanced multi-turn instruction following
12+
- Added support for the `<|system|>` tag
13+
- Improved reasoning and long-context understanding
14+
15+
## Features
816

9-
**1. Install Phi-3 Vision MLX:**
17+
- Support for the newly updated Phi-3-Mini-128K (language-only) model
18+
- Integration with Phi-3-Vision (multimodal) model
19+
- Optimized performance on Apple Silicon using MLX
20+
- Batched generation for processing multiple prompts
21+
- Flexible agent system for various AI tasks
22+
- Custom toolchains for specialized workflows
23+
- Model quantization for improved efficiency
24+
- LoRA fine-tuning capabilities
25+
- API integration for extended functionality (e.g., image generation, text-to-speech)
26+
27+
## Quick Start
1028

11-
To install Phi-3-Vision-MLX, run the following command:
29+
Install and launch Phi-3-MLX from command line:
1230

1331
```bash
1432
pip install phi-3-vision-mlx
33+
phi3v
1534
```
1635

17-
**2. Launch Phi-3 Vision MLX:**
36+
To instead use the library in a Python script:
1837

19-
To launch Phi-3-Vision-MLX:
38+
```python
39+
from phi_3_vision_mlx import generate
40+
```
2041

21-
```bash
22-
phi3v
42+
## Usage Examples
43+
44+
### 1. Core Functionalities
45+
46+
#### Visual Question Answering
47+
48+
```python
49+
generate('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')
2350
```
2451

25-
Or in a Python script:
52+
#### Batch Generation
2653

2754
```python
28-
from phi_3_vision_mlx import Agent
55+
prompts = [
56+
"Explain the key concepts of quantum computing and provide a Rust code example demonstrating quantum superposition.",
57+
"Write a poem about the first snowfall of the year.",
58+
"Summarize the major events of the French Revolution.",
59+
"Describe a bustling alien marketplace on a distant planet with unique goods and creatures."
60+
"Implement a basic encryption algorithm in Python.",
61+
]
62+
63+
# `Phi-3-Vision
64+
generate(prompts, max_tokens=100)
65+
# `Phi-3-Mini-128K
66+
generate(prompts, max_tokens=100, blind_model=True)
67+
```
2968

30-
agent = Agent()
69+
#### Model and Cache Quantization
70+
71+
```python
72+
# `Model quantization
73+
generate("Explain the implications of quantum entanglement in quantum computing.", quantize_model=True)
74+
# `Cache quantization
75+
generate("Describe the potential applications of CRISPR gene editing in medicine.", quantize_cache=True)
76+
```
77+
78+
#### LoRA Fine-tuning
79+
80+
```python
81+
from phi_3_vision_mlx import train_lora
82+
83+
train_lora(lora_layers=5, lora_rank=16, epochs=10, lr=1e-4, warmup=.5, mask_ratios=[.0], adapter_path='adapters', dataset_path = "JosefAlbers/akemiH_MedQA_Reason")
84+
```
85+
86+
![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/train_log.png)
87+
88+
```python
89+
generate("Write a cosmic horror.", adapter_path='adapters')
3190
```
3291

33-
## Usage
92+
### 2. Agent Interactions
3493

35-
### **Visual Question Answering (VQA)**
94+
#### Multi-turn Conversations and Context Handling
3695

3796
```python
38-
agent('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')
39-
agent('What is the location?')
97+
from phi_3_vision_mlx import Agent
98+
99+
agent = Agent()
100+
agent('Analyze this image and describe the architectural style:', 'https://images.metmuseum.org/CRDImages/rl/original/DP-19531-075.jpg')
101+
agent('What historical period does this architecture likely belong to?')
40102
agent.end()
41103
```
42104

43105
![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/vqa.png)
44106

45-
### **Generative Feedback Loop**
46-
47-
The agent can be used to generate code, execute it, and then modify it based on feedback:
107+
#### Generative Feedback Loop
48108

49109
```python
50110
agent('Plot a Lissajous Curve.')
@@ -54,9 +114,7 @@ agent.end()
54114

55115
![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/coding_agent.png)
56116

57-
### **API Tool Use**
58-
59-
You can use the agent to create images or generate speech using API calls:
117+
#### Extending Capabilities with API Integration
60118

61119
```python
62120
agent('Draw "A perfectly red apple, 32k HDR, studio lighting"')
@@ -67,13 +125,9 @@ agent.end()
67125

68126
![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/api_agent.png)
69127

70-
### **Custom Toolchain**
128+
### 3. Toolchain Customization
71129

72-
Toolchains allow you to customize the agent's behavior for specific tasks. Here are three examples:
73-
74-
#### Example 1: In-Context Learning (ICL)
75-
76-
You can create a custom toolchain to add context to the prompt:
130+
#### Example 1. In-Context Learning
77131

78132
```python
79133
from phi_3_vision_mlx import load_text
@@ -96,11 +150,7 @@ agent = Agent(toolchain, early_stop=100)
96150
agent('How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md')
97151
```
98152

99-
This toolchain adds context to the prompt from an external source, enhancing the agent's knowledge for specific queries.
100-
101-
#### Example 2: Retrieval Augmented Generation (RAG)
102-
103-
You can create another custom toolchain for retrieval-augmented generation (RAG) to code:
153+
#### Example 2. Retrieval Augmented Coding
104154

105155
```python
106156
from phi_3_vision_mlx import VDB
@@ -130,67 +180,34 @@ agent = Agent(toolchain_plot, False)
130180
_, images = agent(user_input)
131181
```
132182

133-
#### Example 3: Multi-Agent Interaction
134-
135-
You can also have multiple agents interacting to complete a task:
183+
#### Example 3. Multi-Agent Interaction
136184

137185
```python
138186
# Continued from Example 2
139187
agent_writer = Agent(early_stop=100)
140188
agent_writer(f'Write a stock analysis report on: {user_input}', images)
141189
```
142190

143-
### **Batch Generation**
144-
145-
For efficient processing of multiple prompts:
146-
147-
```python
148-
from phi_3_vision_mlx import generate
149-
150-
generate([
151-
"Write an executive summary for a communications business plan",
152-
"Write a resume.",
153-
"Write a mystery horror.",
154-
"Write a Neurology ICU Admission Note.",])
155-
```
156-
157-
### **Model and Cache Quantization**
158-
159-
Quantization can significantly reduce model size and improve inference speed:
160-
161-
```python
162-
generate("Write a cosmic horror.", quantize_cache=True)
163-
generate("Write a cosmic horror.", quantize_model=True)
164-
```
165-
166-
### **LoRA Training and Inference**
167-
168-
Fine-tune the model for specific tasks:
191+
## Benchmarks
169192

170193
```python
171-
from phi_3_vision_mlx import train_lora
194+
from phi_3_vision_mlx import benchmark
172195

173-
train_lora(lora_layers=5, lora_rank=16, epochs=10, lr=1e-4, warmup=.5, mask_ratios=[.0], adapter_path='adapters', dataset_path = "JosefAlbers/akemiH_MedQA_Reason")
174-
```
175-
176-
![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/train_log.png)
177-
178-
```python
179-
generate("Write a cosmic horror.", adapter_path='adapters')
196+
benchmark()
180197
```
181198

182-
## Benchmarks
199+
| Task | Vanilla Model | Quantized Model | Quantized Cache | LoRA Adapter |
200+
|-----------------------|---------------|-----------------|-----------------|--------------|
201+
| Text Generation | 8.72 tps | 55.97 tps | 7.04 tps | 8.71 tps |
202+
| Image Captioning | 8.04 tps | 32.48 tps | 1.77 tps | 8.00 tps |
203+
| Batched Generation | 30.74 tps | 106.94 tps | 20.47 tps | 30.72 tps |
183204

184-
| Task | Vanilla Model | Quantized Model | Quantized Cache | LoRA |
185-
|-----------------------|---------------|-----------------|-----------------|-------------|
186-
| Text Generation | 8.72 tps | 55.97 tps | 7.04 tps | 8.71 tps |
187-
| Image Captioning | 8.04 tps | 32.48 tps | 1.77 tps | 8.00 tps |
188-
| Batched Generation | 30.74 tps | 106.94 tps | 20.47 tps | 30.72 tps |
205+
*(On an M1 Max 64GB)*
189206

190207
## License
191208

192209
This project is licensed under the [MIT License](LICENSE).
193210

194211
## Citation
195212

196-
<a href="https://zenodo.org/doi/10.5281/zenodo.11403221"><img src="https://zenodo.org/badge/806709541.svg" alt="DOI"></a>
213+
<a href="https://zenodo.org/doi/10.5281/zenodo.11403221"><img src="https://zenodo.org/badge/806709541.svg" alt="DOI"></a>

0 commit comments

Comments
 (0)