Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
7184dcc
added initial
ikramulkayes Oct 9, 2025
5747200
added initial
ikramulkayes Oct 9, 2025
829abf8
retrival
ikramulkayes Oct 9, 2025
1c417c7
added printing
ikramulkayes Oct 9, 2025
ec2f76d
added qdrant http
ikramulkayes Oct 11, 2025
3443926
added reranker
ikramulkayes Oct 11, 2025
124f3e6
added update
ikramulkayes Oct 12, 2025
24fabbb
fallback
ikramulkayes Oct 15, 2025
4dfbfd6
printing statement
ikramulkayes Oct 16, 2025
f8e2f69
temp fixing mongodb
ikramulkayes Oct 16, 2025
c3a193c
no mongodb on library
ikramulkayes Oct 18, 2025
d4317d3
fixed upsert
ikramulkayes Oct 18, 2025
53e1cb8
filtering
ikramulkayes Oct 18, 2025
5ab318b
refactor: move documentation into docs directory
tahmidul612 Oct 20, 2025
175425b
chore: cleanup testing_api files and directory
tahmidul612 Oct 20, 2025
b611d52
chore: remove unused base embedding file
tahmidul612 Oct 20, 2025
c8a3e9e
chore: update uv.lock
tahmidul612 Oct 20, 2025
afe6bd3
refactor: remove unused ssl import in QdrantVectorDB
tahmidul612 Oct 20, 2025
87f48cf
refactor: move utility scripts into utils directory
tahmidul612 Oct 20, 2025
9bed5e6
refactor: update project structure and clean up unused files (#5)
tahmidul612 Oct 20, 2025
0db0c76
chore: update GEMINI.md
tahmidul612 Oct 20, 2025
149a488
docs: add concise README for insta_rag
tahmidul612 Oct 20, 2025
271fdf5
docs: remove legacy design, implementation and phase artifacts
tahmidul612 Oct 20, 2025
a6f10b8
docs: remove legacy guides, install and troubleshooting notes
tahmidul612 Oct 20, 2025
b1b39ef
docs(guides): add core guides for document management, retrieval, sto…
tahmidul612 Oct 20, 2025
0a79e3f
docs: add Installation guide
tahmidul612 Oct 20, 2025
1c074da
docs: add Quickstart and top-level index
tahmidul612 Oct 20, 2025
d7a3cf2
chore: rename index to README
tahmidul612 Oct 20, 2025
23a86b2
style: fix markdown formatting and fix links
tahmidul612 Oct 20, 2025
fb5ddd3
docs: update license section to include a link to the MIT License
tahmidul612 Oct 20, 2025
430dae9
docs: revise documentation and guides for insta_rag (#7)
tahmidul612 Oct 20, 2025
fc5d967
chore: update version to 0.1.0 in pyproject.toml and uv.lock; add ini…
tahmidul612 Oct 20, 2025
0f97672
Merge remote-tracking branch 'origin/main' into feat/build-rag-pipeli…
tahmidul612 Oct 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## 0.1.0 (2025-10-20)

### Features

- **Initial Release of `insta_rag` library**:
- Introduced a modular, plug-and-play Python library for building advanced Retrieval-Augmented Generation (RAG) pipelines.
- Core features include:
- Semantic Chunking
- Hybrid Retrieval (Vector Search + Keyword Search)
- Query Transformation (HyDE)
- Reranking with Cohere
- Pluggable architecture for chunkers, embedders, and vector databases.
- Hybrid storage with Qdrant and MongoDB.
105 changes: 105 additions & 0 deletions GEMINI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Gemini Context: insta_rag Project

## Project Overview

This project, `insta_rag`, is a modular and extensible Python library designed for building Retrieval-Augmented Generation (RAG) pipelines. It abstracts the complexity of RAG into three primary operations: adding, updating, and retrieving documents.

**Key Technologies & Architecture:**

- **Core Client:** The main entry point is the `RAGClient`, which orchestrates all operations.
- **Embeddings & LLMs:** Utilizes OpenAI (`text-embedding-3-large`, GPT-4) or Azure OpenAI for generating embeddings and hypothetical answers (HyDE).
- **Vector Database:** Uses Qdrant for efficient vector storage and search.
- **Reranking:** Integrates Cohere for cross-encoder reranking to improve the relevance of search results.
- **Architecture:** The library is built on an interface-based design, allowing for plug-and-play components. Core modules for `chunking`, `embedding`, `vectordb`, and `retrieval` each have a `base.py` defining an abstract interface, making it easy to extend with new implementations (e.g., adding Pinecone as a vector DB).
- **Data Models:** Pydantic is used for robust data validation and clear data structures for documents, chunks, and API responses.

The primary goal is to provide a complete, configuration-driven RAG system that is both easy to use and easy to extend.

## Documentation

The project documentation has been reorganized for clarity and is located in the `/docs` directory.

- **[README.md](./docs/README.md):** Main landing page with links to all other documents.
- **[installation.md](./docs/installation.md):** Detailed installation instructions.
- **[quickstart.md](./docs/quickstart.md):** A hands-on guide to get started quickly.
- **Guides (`/docs/guides`):**
- **[document-management.md](./docs/guides/document-management.md):** Covers adding, updating, and deleting documents.
- **[retrieval.md](./docs/guides/retrieval.md):** Explains the advanced hybrid retrieval pipeline.
- **[storage-backends.md](./docs/guides/storage-backends.md):** Details on configuring Qdrant-only vs. hybrid Qdrant+MongoDB storage.
- **[local-development.md](./docs/guides/local-development.md):** Instructions for setting up a local Qdrant instance.

## Building and Running

### 1. Installation

The project uses `uv` for package management.

```bash
# Install the package in editable mode with all dependencies
uv pip install -e .
```

Alternatively, using `pip` and a virtual environment:

```bash
# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install in editable mode
pip install -e .
```

### 2. Environment Setup

The client is configured via a `.env` file. Create one in the project root with the variables listed in `docs/installation.md`.

### 3. Running the Example

The `examples/basic_usage.py` script demonstrates the core functionality of the library.

```bash
# Run the basic usage example
python examples/basic_usage.py
```

### 4. Running Tests

The project contains a `tests/` directory. Tests can be run using `pytest`.

```bash
# TODO: Verify if this is the correct test command.
pytest
```

## Development Conventions

This project has a strong focus on code quality and consistency, enforced by several tools.

### 1. Linting and Formatting

- **Tool:** `Ruff` is used for both linting and formatting.

- **Usage:**

```bash
# Check for linting errors and auto-fix them
ruff check . --fix

# Format the codebase
ruff format .
```

### 2. Pre-commit Hooks

- **Framework:** `pre-commit` is used to run checks before each commit.

- **Setup:** First-time contributors must install the hooks:

```bash
pre-commit install
```

### 3. Commit Messages

- **Standard:** The project follows the **Conventional Commits** specification, enforced by `commitizen`.
Loading