An open-source Swift library that enables macOS and iOS projects to utilize the Mistral-Interact7B models (INT4 and upcoming FP16) in chat mode. This library includes a complete Swift implementation of the tokenizer and Byte Pair Encoding (BPE) encoder, providing an out-of-the-box solution for integrating advanced language models into your Swift applications.
- Full Swift Implementation: Includes tokenizer and BPE encoder written entirely in Swift.
- CoreML Integration: Leverages Apple's CoreML framework to run Mistral-Interact7B models efficiently.
- Multiple Decoding Strategies: Supports Greedy and Top-K sampling, with plans to add more strategies.
- Chat Functionality: Designed to work in chat mode for interactive applications.
- FP16 Support (Coming Soon): Future version will support FP16 models for improved performance.
- iOS Compatibility: Aiming for compatibility with iOS 18 and newer.
- Extensible Interface: Future improvements will focus on enhancing flexibility and adding more features.
The Mistral-Interact7B CoreML model is not included in this library. You must download the model from Hugging Face and add it to your project manually.
Add SwiftMistralCoreML
to your project using Swift Package Manager:
- In Xcode, go to File > Swift Packages > Add Package Dependency.
- Enter the repository URL:
https://github.com/cardona/SwiftMistralCoreML.git
- Choose the latest version and follow the prompts to add the package.
- Download the Mistral-Interact7B CoreML model (INT4 or FP16) from Hugging Face.
- Drag and drop the downloaded
.mlmodel
file into your Xcode project.
Here's how to use SwiftMistralCoreML
in your project:
import SwiftMistralCoreML
Create an instance of MistralParameters
to specify the model's behavior:
let parameters = MistralParameters(
userInput: "Hello, how are you?",
systemPrompt: "You are a helpful assistant.",
algorithm: .greedy,
maxTokens: 500,
topK: nil // Not required for Greedy decoding
)
Use the TextGenerator
class to generate responses:
let tokenizerParser = try TokenizerParser()
let bpeEncoder = BPEEncoder(tokenizerParser: tokenizerParser)
let textGenerator = try TextGenerator(bpeEncoder: bpeEncoder, tokenizerParser: tokenizerParser)
let messages = [
Message(role: .system, content: parameters.systemPrompt),
Message(role: .user, content: parameters.userInput)
]
let mistralInput = try MistralInput(messages: messages, bpeEncoder: bpeEncoder, tokenizer: tokenizerParser)
let generatedText = try await textGenerator.generateText(
from: mistralInput.inputTokens,
using: parameters,
progressHandler: { generatedWord in
print(generatedWord)
}
)
print("Assistant: \(generatedText)")
Below is a complete example of how to integrate and use the library:
import SwiftMistralCoreML
final class MistralChat {
private var messages: [Message] = []
func startChat() async {
do {
let parameters = MistralParameters(
userInput: "Tell me a joke.",
systemPrompt: "You are a witty assistant.",
algorithm: .greedy,
maxTokens: 50,
topK: nil
)
let tokenizerParser = try TokenizerParser()
let bpeEncoder = BPEEncoder(tokenizerParser: tokenizerParser)
let textGenerator = try TextGenerator(bpeEncoder: bpeEncoder, tokenizerParser: tokenizerParser)
messages.append(Message(role: .system, content: parameters.systemPrompt))
messages.append(Message(role: .user, content: parameters.userInput))
let mistralInput = try MistralInput(messages: messages, bpeEncoder: bpeEncoder, tokenizer: tokenizerParser)
let generatedText = try await textGenerator.generateText(
from: mistralInput.inputTokens,
using: parameters,
progressHandler: { generatedWord in
print(generatedWord, terminator: "")
}
)
messages.append(Message(role: .assistant, content: generatedText))
print("\nAssistant: \(generatedText)")
} catch {
print("Error: \(error.localizedDescription)")
}
}
}
let chat = MistralChat()
Task {
await chat.startChat()
}
The library supports multiple decoding algorithms:
- Greedy Decoding: Selects the token with the highest probability at each step.
- Top-K Sampling: Samples the next token from the top K most probable tokens.
Specify the algorithm when creating MistralParameters
:
let parameters = MistralParameters(
userInput: "What's the weather like today?",
systemPrompt: "",
algorithm: .topK,
maxTokens: 500,
topK: 10 // Required for Top-K decoding
)
For better performance in scenarios where you need to process multiple requests concurrently, you can use the async prediction interface:
let textGenerator = try TextGenerator()
// Process multiple questions concurrently
try await withThrowingTaskGroup(of: (String, String).self) { group in
for question in questions {
group.addTask {
let messages = [
Message(role: .system, content: "You are a helpful assistant."),
Message(role: .user, content: question)
]
let parameters = MistralParameters(
modelType: .int4,
userInput: question,
systemPrompt: "You are a helpful assistant.",
algorithm: .greedy,
maxTokens: 16,
topK: nil
)
// Use generateTextAsync for concurrent processing
let result = try await textGenerator.generateTextAsync(
messages: messages,
using: parameters,
progressHandler: nil
)
return (question, result)
}
}
// Collect results as they complete
for try await (question, answer) in group {
print("Q: \(question)")
print("A: \(answer)\n")
}
}
The async interface is particularly useful when:
- Processing multiple requests concurrently
- Integrating with async/await workflows
- Handling batch processing scenarios
- Building responsive UIs that need to remain interactive during processing
- FP16 Model Support: Upcoming support for FP16 models, providing better accuracy and performance.
- More Decoding Strategies: Planned addition of new decoding strategies such as Beam Search and Temperature Sampling.
- Enhanced Compatibility: Ensuring the library is fully compatible with iOS 18 and supports iPhone devices.
- Interface Improvements: Ongoing improvements to the API for better ease of use and flexibility.
You can extend the library to support additional features or models. The modular design allows you to replace components like the decoding strategy or tokenizer as needed.
Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.
This project is licensed under the Apache 2.0 License.
Created because there were no existing examples on how to use Mistral CoreML models with Swift.