VecturaKit

VecturaKit is a Swift-based vector database designed for on-device apps through local vector storage and retrieval. Inspired by Dripfarm's SVDB, VecturaKit uses MLTensor and swift-embeddings for generating and managing embeddings. It features Model2Vec support with the 32M parameter model as default for fast static embeddings.

The framework offers two primary modules: VecturaKit, which supports many embedding models via swift-embeddings, and VecturaMLXKit, which uses Apple's MLX framework. It also includes CLI tools (vectura-cli and vectura-mlx-cli) for easily trying out the package.

Learn More

Explore the following books to understand more about AI and iOS development:

Features

Model2Vec Support: Uses the retrieval 32M parameter Model2Vec model as default for fast static embeddings.
Auto-Dimension Detection: Automatically detects embedding dimensions from models.
On-Device Storage: Stores and manages vector embeddings locally.
Hybrid Search: Combines vector similarity with BM25 text search for relevant search results (VecturaKit).
Batch Processing: Indexes documents in parallel for faster data ingestion.
Persistent Storage: Automatically saves and loads document data, preserving the database state across app sessions.
Configurable Search: Customizes search behavior with adjustable thresholds, result limits, and hybrid search weights.
Custom Storage Location: Specifies a custom directory for database storage.
MLX Support: Uses Apple's MLX framework for embedding generation and search operations (VecturaMLXKit).
CLI Tool: Includes CLIs for database management, testing, and debugging both VecturaKit and VecturaMLXKit.

Supported Platforms

macOS 14.0 or later
iOS 17.0 or later
tvOS 17.0 or later
visionOS 1.0 or later
watchOS 10.0 or later

Installation

Swift Package Manager

To integrate VecturaKit into your project using Swift Package Manager, add the following dependency in your Package.swift file:

dependencies: [
    .package(url: "https://github.com/rryam/VecturaKit.git", branch: "main"),
],

Dependencies

VecturaKit uses the following Swift packages:

swift-embeddings: Used in VecturaKit for generating text embeddings using various models.
swift-argument-parser: Used for creating the command-line interface.
mlx-swift-examples: Provides MLX-based embeddings and vector search capabilities, specifically for VecturaMLXKit.

The project also has the following dependencies, as specified in Package.resolved:

Usage

Core VecturaKit

Import VecturaKit
```
import VecturaKit
```

Create Configuration and Initialize Database

import Foundation
import VecturaKit

let config = VecturaConfig(
    name: "my-vector-db",
    directoryURL: nil,  // Optional custom storage location
    dimension: nil,     // Auto-detect dimension from model (recommended)
    searchOptions: VecturaConfig.SearchOptions(
        defaultNumResults: 10,
        minThreshold: 0.7,
        hybridWeight: 0.5,  // Balance between vector and text search
        k1: 1.2,           // BM25 parameters
        b: 0.75
    )
)

let vectorDB = try await VecturaKit(config: config)

Add Documents

Single document:

let text = "Sample text to be embedded"
let documentId = try await vectorDB.addDocument(
    text: text,
    id: UUID(),  // Optional, will be generated if not provided
    model: .default  // Uses Model2Vec 32M model by default
)

Multiple documents in batch:

let texts = [
    "First document text",
    "Second document text",
    "Third document text"
]
let documentIds = try await vectorDB.addDocuments(
    texts: texts,
    ids: nil,  // Optional array of UUIDs
    model: .default  // Uses Model2Vec 32M model by default
)

Search Documents

Search by text (hybrid search):

let results = try await vectorDB.search(
    query: "search query",
    numResults: 5,      // Optional
    threshold: 0.8,     // Optional
    model: .default     // Uses Model2Vec 32M model by default
)

for result in results {
    print("Document ID: \(result.id)")
    print("Text: \(result.text)")
    print("Similarity Score: \(result.score)")
    print("Created At: \(result.createdAt)")
}

Search by vector embedding:

let results = try await vectorDB.search(
    query: embeddingArray,  // [Float] matching config.dimension
    numResults: 5,  // Optional
    threshold: 0.8  // Optional
)

Document Management

Update document:

try await vectorDB.updateDocument(
    id: documentId,
    newText: "Updated text",
    model: .default  // Uses Model2Vec 32M model by default
)

Delete documents:

try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])

Reset database:

try await vectorDB.reset()

VecturaMLXKit (MLX Version)

VecturaMLXKit harnesses Apple's MLX framework for accelerated processing, delivering optimized performance for on-device machine learning tasks.

Import VecturaMLXKit
```
import VecturaMLXKit
```

Initialize Database

import VecturaMLXKit
import MLXEmbedders

let config = VecturaConfig(
  name: "my-mlx-vector-db",
  dimension: 768 //  nomic_text_v1_5 model outputs 768-dimensional embeddings
)
let vectorDB = try await VecturaMLXKit(config: config, modelConfiguration: .nomic_text_v1_5)

Add Documents

let texts = [
  "First document text",
  "Second document text",
  "Third document text"
]
let documentIds = try await vectorDB.addDocuments(texts: texts)

Search Documents

let results = try await vectorDB.search(
    query: "search query",
    numResults: 5,      // Optional
    threshold: 0.8     // Optional
)

for result in results {
    print("Document ID: \(result.id)")
    print("Text: \(result.text)")
    print("Similarity Score: \(result.score)")
    print("Created At: \(result.createdAt)")
}

Document Management

Update document:

try await vectorDB.updateDocument(
     id: documentId,
     newText: "Updated text"
 )

Delete documents:

try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])

Reset database:

try await vectorDB.reset()

Command Line Interface

VecturaKit includes a command-line interface for both the standard and MLX versions, facilitating easy database management.

Standard CLI Tool (vectura-cli)

# Add documents (dimension auto-detected from model)
vectura add "First document" "Second document" "Third document" \
  --db-name "my-vector-db"

# Search documents
vectura search "search query" \
  --db-name "my-vector-db" \
  --threshold 0.7 \
  --num-results 5

# Update document
vectura update <document-uuid> "Updated text content" \
  --db-name "my-vector-db"

# Delete documents
vectura delete <document-uuid-1> <document-uuid-2> \
  --db-name "my-vector-db"

# Reset database
vectura reset \
  --db-name "my-vector-db"

# Run demo with sample data
vectura mock \
  --db-name "my-vector-db" \
  --threshold 0.7 \
  --num-results 10

Common options for vectura-cli:

--db-name, -d: Database name (default: "vectura-cli-db")
--dimension, -v: Vector dimension (auto-detected by default)
--threshold, -t: Minimum similarity threshold (default: 0.7)
--num-results, -n: Number of results to return (default: 10)
--model-id, -m: Model ID for embeddings (default: "minishlab/potion-retrieval-32M")

MLX CLI Tool (vectura-mlx-cli)

# Add documents
vectura-mlx add "First document" "Second document" "Third document" --db-name "my-mlx-vector-db"

# Search documents
vectura-mlx search "search query" --db-name "my-mlx-vector-db"  --threshold 0.7 --num-results 5

# Update document
vectura-mlx update <document-uuid> "Updated text content" --db-name "my-mlx-vector-db"

# Delete documents
vectura-mlx delete <document-uuid-1> <document-uuid-2> --db-name "my-mlx-vector-db"

# Reset database
vectura-mlx reset --db-name "my-mlx-vector-db"

# Run demo with sample data
vectura-mlx mock  --db-name "my-mlx-vector-db"

Options for vectura-mlx-cli:

--db-name, -d: Database name (default: "vectura-mlx-cli-db")
--threshold, -t: Minimum similarity threshold (default: no threshold)
--num-results, -n: Number of results to return (default: 10)

License

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.swiftpm/xcode		.swiftpm/xcode
.vscode		.vscode
Sources		Sources
Tests		Tests
.gitignore		.gitignore
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
codemagic.yaml		codemagic.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VecturaKit

Learn More

Features

Supported Platforms

Installation

Swift Package Manager

Dependencies

Usage

Core VecturaKit

VecturaMLXKit (MLX Version)

Command Line Interface

License

Contributing

About

Uh oh!

Releases 8

Packages

Contributors 4

Uh oh!

Languages

License

rryam/VecturaKit

Folders and files

Latest commit

History

Repository files navigation

VecturaKit

Learn More

Features

Supported Platforms

Installation

Swift Package Manager

Dependencies

Usage

Core VecturaKit

VecturaMLXKit (MLX Version)

Command Line Interface

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 4

Uh oh!

Languages

Packages