VecturaKit

VecturaKit is a Swift-based vector database designed for on-device applications, enabling advanced user experiences through local vector storage and retrieval. Inspired by Dripfarm's SVDB, VecturaKit utilizes MLTensor and swift-embeddings for generating and managing embeddings. The framework offers two primary modules: VecturaKit, which supports diverse embedding models via swift-embeddings, and VecturaMLXKit, which leverages Apple's MLX framework for accelerated processing.

Key Features

On-Device Storage: Stores and manages vector embeddings locally, enhancing privacy and reducing latency.
Hybrid Search: Combines vector similarity with BM25 text search for comprehensive and relevant search results (VecturaKit).
Batch Processing: Indexes documents in parallel for faster data ingestion.
Persistent Storage: Automatically saves and loads document data, preserving the database state across app sessions.
Configurable Search: Customizes search behavior with adjustable thresholds, result limits, and hybrid search weights.
Custom Storage Location: Specifies a custom directory for database storage.
MLX Support: Employs Apple's MLX framework for accelerated embedding generation and search operations (VecturaMLXKit).
CLI Tool: Includes a command-line interface (CLI) for database management, testing, and debugging for both VecturaKit and VecturaMLXKit.

Supported Platforms

macOS 14.0 or later
iOS 17.0 or later
tvOS 17.0 or later
visionOS 1.0 or later
watchOS 10.0 or later

Installation

Swift Package Manager

To integrate VecturaKit into your project using Swift Package Manager, add the following dependency in your Package.swift file:

dependencies: [
    .package(url: "https://github.com/rryam/VecturaKit.git", branch: "main"),
],

Dependencies

VecturaKit relies on the following Swift packages:

swift-embeddings: Used in VecturaKit for generating text embeddings using various models.
swift-argument-parser: Used for creating the command-line interface.
mlx-swift-examples: Provides MLX-based embeddings and vector search capabilities, specifically for VecturaMLXKit.

Usage

Core VecturaKit

Import VecturaKit
```
import VecturaKit
```

Create Configuration and Initialize Database

import Foundation
import VecturaKit

let config = VecturaConfig(
    name: "my-vector-db",
    directoryURL: nil,  // Optional custom storage location
    dimension: 384,     // Matches the default BERT model dimension
    searchOptions: VecturaConfig.SearchOptions(
        defaultNumResults: 10,
        minThreshold: 0.7,
        hybridWeight: 0.5,  // Balance between vector and text search
        k1: 1.2,           // BM25 parameters
        b: 0.75
    )
)

let vectorDB = try await VecturaKit(config: config)

Add Documents

Single document:

let text = "Sample text to be embedded"
let documentId = try await vectorDB.addDocument(
    text: text,
    id: UUID(),  // Optional, will be generated if not provided
    model: .id("sentence-transformers/all-MiniLM-L6-v2")  // Optional, this is the default
)

Multiple documents in batch:

let texts = [
    "First document text",
    "Second document text",
    "Third document text"
]
let documentIds = try await vectorDB.addDocuments(
    texts: texts,
    ids: nil,  // Optional array of UUIDs
     model: .id("sentence-transformers/all-MiniLM-L6-v2") // Optional model
)

Search Documents

Search by text (hybrid search):

let results = try await vectorDB.search(
    query: "search query",
    numResults: 5,      // Optional
    threshold: 0.8,     // Optional
    model: .id("sentence-transformers/all-MiniLM-L6-v2")  // Optional
)

for result in results {
    print("Document ID: \(result.id)")
    print("Text: \(result.text)")
    print("Similarity Score: \(result.score)")
    print("Created At: \(result.createdAt)")
}

Search by vector embedding:

let results = try await vectorDB.search(
    query: embeddingArray,  // [Float] matching config.dimension
    numResults: 5,  // Optional
    threshold: 0.8  // Optional
)

Document Management

Update document:

try await vectorDB.updateDocument(
    id: documentId,
    newText: "Updated text",
    model: .id("sentence-transformers/all-MiniLM-L6-v2")  // Optional
)

Delete documents:

try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])

Reset database:

try await vectorDB.reset()

VecturaMLXKit (MLX Version)

VecturaMLXKit harnesses Apple's MLX framework for accelerated processing, delivering optimized performance for on-device machine learning tasks.

Import VecturaMLXKit
```
import VecturaMLXKit
```

Initialize Database

import VecturaMLXKit
import MLXEmbedders

let config = VecturaConfig(
  name: "my-mlx-vector-db",
  dimension: 768 //  nomic_text_v1_5 model outputs 768-dimensional embeddings
)
let vectorDB = try await VecturaMLXKit(config: config, modelConfiguration: .nomic_text_v1_5)

Add Documents

let texts = [
  "First document text",
  "Second document text",
  "Third document text"
]
let documentIds = try await vectorDB.addDocuments(texts: texts)

Search Documents

let results = try await vectorDB.search(
    query: "search query",
    numResults: 5,      // Optional
    threshold: 0.8     // Optional
)

for result in results {
    print("Document ID: \(result.id)")
    print("Text: \(result.text)")
    print("Similarity Score: \(result.score)")
    print("Created At: \(result.createdAt)")
}

Document Management

Update document:

try await vectorDB.updateDocument(
     id: documentId,
     newText: "Updated text"
 )

Delete documents:

try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])

Reset database:

try await vectorDB.reset()

Command Line Interface

VecturaKit includes a command-line interface for both the standard and MLX versions, facilitating easy database management.

Standard CLI Tool

# Add documents
vectura add "First document" "Second document" "Third document" \
  --db-name "my-vector-db" \
  --dimension 384 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Search documents
vectura search "search query" \
  --db-name "my-vector-db" \
  --dimension 384 \
  --threshold 0.7 \
  --num-results 5 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Update document
vectura update <document-uuid> "Updated text content" \
  --db-name "my-vector-db" \
  --dimension 384 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Delete documents
vectura delete <document-uuid-1> <document-uuid-2> \
  --db-name "my-vector-db" \
  --dimension 384

# Reset database
vectura reset \
  --db-name "my-vector-db" \
  --dimension 384

# Run demo with sample data
vectura mock \
  --db-name "my-vector-db" \
  --dimension 384 \
  --threshold 0.7 \
  --num-results 10 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

Common options:

--db-name, -d: Database name (default: "vectura-cli-db")
--dimension, -v: Vector dimension (default: 384)
--threshold, -t: Minimum similarity threshold (default: 0.7)
--num-results, -n: Number of results to return (default: 10)
--model-id, -m: Model ID for embeddings (default: "sentence-transformers/all-MiniLM-L6-v2")

MLX CLI Tool

# Add documents
vectura-mlx add "First document" "Second document" "Third document" --db-name "my-mlx-vector-db"

# Search documents
vectura-mlx search "search query" --db-name "my-mlx-vector-db"  --threshold 0.7 --num-results 5

# Update document
vectura-mlx update <document-uuid> "Updated text content" --db-name "my-mlx-vector-db"

# Delete documents
vectura-mlx delete <document-uuid-1> <document-uuid-2> --db-name "my-mlx-vector-db"

# Reset database
vectura-mlx reset --db-name "my-mlx-vector-db"

# Run demo with sample data
vectura-mlx mock  --db-name "my-mlx-vector-db"

License

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

Development

The project is structured as a Swift Package. It includes the following key targets:

VecturaKit: The core vector database library.
VecturaMLXKit: The MLX-accelerated version of the library.
vectura-cli: The command-line interface for VecturaKit.
vectura-mlx-cli: The command-line interface for VecturaMLXKit.

To build and test the project, use the following commands:

swift build
swift test

The project also includes CI workflows defined in .github/workflows to automate building and testing on pull requests and pushes to the main branch. The workflows require Xcode 16.1 and Swift 6.0.

Debugging configurations are provided in .vscode/launch.json for the vectura-cli. These can be used to launch the CLI with the debugger attached.

Continuous Integration

The project uses GitHub Actions for continuous integration. The following workflows are defined:

.github/workflows/build_and_test_mlx.yml: Builds and tests the VecturaMLXKit target.
.github/workflows/build_and_test_vectura.yml: Builds and tests the VecturaKit and vectura-cli targets.
.github/workflows/update-readme.yml: Automatically updates the README.md file using a Python script that calls the Gemini AI model. This workflow is triggered on pushes to the main branch and creates a pull request with the updated README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VecturaKit

Key Features

Supported Platforms

Installation

Swift Package Manager

Dependencies

Usage

Core VecturaKit

VecturaMLXKit (MLX Version)

Command Line Interface

License

Contributing

Development

Continuous Integration

Files

README.md

Latest commit

History

README.md

File metadata and controls

VecturaKit

Key Features

Supported Platforms

Installation

Swift Package Manager

Dependencies

Usage

Core VecturaKit

VecturaMLXKit (MLX Version)

Command Line Interface

License

Contributing

Development

Continuous Integration