Skip to content

Releases: messkan/rag-chunk

v0.3.0

25 Nov 00:00

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

[0.3.0] - 2025-11-25

Added

  • Recursive Character Splitting: New strategy recursive-character using LangChain (requires rag-chunk[langchain]).
  • New File Formats: Support for parsing .txt files in addition to Markdown.
  • Advanced Metrics: Added Precision and F1-score to evaluation reports.

v0.2.0

17 Nov 23:14

Choose a tag to compare

rag-chunk v0.2.0 Release Notes

Summary

Adds token-based chunking via tiktoken, model selection, richer CLI output, improved docs, and expanded tests. Backward compatible.

Added

  • --use-tiktoken for token-precise chunking
  • --tiktoken-model (default: gpt-3.5-turbo)
  • Optional install: pip install rag-chunk[tiktoken]
  • Rich table output (auto if rich installed)
  • README: tokenization section and usage examples
  • Unit tests for token-based paths

Improved

  • CLI help clarifies words vs tokens
  • Refactored strategy runner (fewer locals, pylint clean)
  • Lint compliance across modules

Fixed

  • Line length, broad exception, naming, and local variable warnings
  • Optional extras warning (prepared for publish)

Usage

Word-based:

rag-chunk analyze examples/ --strategy fixed-size --chunk-size 300

v0.1.0 - Initial Release

15 Nov 01:44

Choose a tag to compare

Features

  • Parse and clean Markdown files
  • Three chunking strategies: fixed-size, sliding-window, paragraph
  • Recall-based evaluation with test JSON files
  • CLI with table/JSON/CSV output formats
  • Example corpus with realistic test cases

Installation

pip install rag-chunk