You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Phi-3-MLX: Language and Vision Models for Apple Silicon
2
2
3
-
Phi-3-Vision for Apple MLX is a powerful and flexible AI agent framework that leverages the Phi-3-Vision model to perform a wide range of tasks, from visual question answering to code generation and execution. This project aims to provide an easy-to-use interface for interacting with the Phi-3-Vision model, while also offering advanced features like custom toolchains and model quantization.
3
+
Phi-3-MLX is a versatile AI framework that leverages both the Phi-3-Vision multimodal model and the recently updated ([July 2, 2024](https://x.com/reach_vb/status/1808056108319179012)) Phi-3-Mini-128K language model, optimized for Apple Silicon using the MLX framework. This project provides an easy-to-use interface for a wide range of AI tasks, from advanced text generation to visual question answering and code execution.
4
4
5
-
Phi-3-Vision is a state-of-the-art vision-language model that excels in understanding and generating content based on both textual and visual inputs. By integrating this model with Apple's MLX framework, we provide a high-performance solution optimized for Apple silicon.
5
+
## Recent Updates: Phi-3 Mini Improvements
6
6
7
-
## Quick Start
7
+
Microsoft has recently released significant updates to the Phi-3 Mini model, dramatically improving its capabilities:
8
+
9
+
- Substantially enhanced code understanding in Python, C++, Rust, and TypeScript
10
+
- Improved post-training for better-structured output
11
+
- Enhanced multi-turn instruction following
12
+
- Added support for the `<|system|>` tag
13
+
- Improved reasoning and long-context understanding
14
+
15
+
## Features
8
16
9
-
**1. Install Phi-3 Vision MLX:**
17
+
- Support for the newly updated Phi-3-Mini-128K (language-only) model
18
+
- Integration with Phi-3-Vision (multimodal) model
19
+
- Optimized performance on Apple Silicon using MLX
20
+
- Batched generation for processing multiple prompts
21
+
- Flexible agent system for various AI tasks
22
+
- Custom toolchains for specialized workflows
23
+
- Model quantization for improved efficiency
24
+
- LoRA fine-tuning capabilities
25
+
- API integration for extended functionality (e.g., image generation, text-to-speech)
26
+
27
+
## Quick Start
10
28
11
-
To install Phi-3-Vision-MLX, run the following command:
29
+
Install and launch Phi-3-MLX from command line:
12
30
13
31
```bash
14
32
pip install phi-3-vision-mlx
33
+
phi3v
15
34
```
16
35
17
-
**2. Launch Phi-3 Vision MLX:**
36
+
To instead use the library in a Python script:
18
37
19
-
To launch Phi-3-Vision-MLX:
38
+
```python
39
+
from phi_3_vision_mlx import generate
40
+
```
20
41
21
-
```bash
22
-
phi3v
42
+
## Usage Examples
43
+
44
+
### 1. Core Functionalities
45
+
46
+
#### Visual Question Answering
47
+
48
+
```python
49
+
generate('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')
23
50
```
24
51
25
-
Or in a Python script:
52
+
#### Batch Generation
26
53
27
54
```python
28
-
from phi_3_vision_mlx import Agent
55
+
prompts = [
56
+
"Explain the key concepts of quantum computing and provide a Rust code example demonstrating quantum superposition.",
57
+
"Write a poem about the first snowfall of the year.",
58
+
"Summarize the major events of the French Revolution.",
59
+
"Describe a bustling alien marketplace on a distant planet with unique goods and creatures."
60
+
"Implement a basic encryption algorithm in Python.",
agent('How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md')
97
151
```
98
152
99
-
This toolchain adds context to the prompt from an external source, enhancing the agent's knowledge for specific queries.
100
-
101
-
#### Example 2: Retrieval Augmented Generation (RAG)
102
-
103
-
You can create another custom toolchain for retrieval-augmented generation (RAG) to code:
0 commit comments