Releases: messkan/rag-chunk
Releases · messkan/rag-chunk
v0.3.0
Changelog
All notable changes to this project will be documented in this file.
[0.3.0] - 2025-11-25
Added
- Recursive Character Splitting: New strategy
recursive-characterusing LangChain (requiresrag-chunk[langchain]). - New File Formats: Support for parsing
.txtfiles in addition to Markdown. - Advanced Metrics: Added Precision and F1-score to evaluation reports.
v0.2.0
rag-chunk v0.2.0 Release Notes
Summary
Adds token-based chunking via tiktoken, model selection, richer CLI output, improved docs, and expanded tests. Backward compatible.
Added
--use-tiktokenfor token-precise chunking--tiktoken-model(default:gpt-3.5-turbo)- Optional install:
pip install rag-chunk[tiktoken] - Rich table output (auto if
richinstalled) - README: tokenization section and usage examples
- Unit tests for token-based paths
Improved
- CLI help clarifies words vs tokens
- Refactored strategy runner (fewer locals, pylint clean)
- Lint compliance across modules
Fixed
- Line length, broad exception, naming, and local variable warnings
- Optional extras warning (prepared for publish)
Usage
Word-based:
rag-chunk analyze examples/ --strategy fixed-size --chunk-size 300v0.1.0 - Initial Release
Features
- Parse and clean Markdown files
- Three chunking strategies: fixed-size, sliding-window, paragraph
- Recall-based evaluation with test JSON files
- CLI with table/JSON/CSV output formats
- Example corpus with realistic test cases
Installation
pip install rag-chunk