Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
- Pluggable architecture for chunkers, embedders, and vector databases.
- Hybrid storage with Qdrant and MongoDB.

## v0.1.1-beta.6 (2025-11-25)

## v0.1.1-beta.5 (2025-11-21)

## v0.1.1-beta.4 (2025-11-20)
Expand Down
79 changes: 75 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,10 +279,9 @@ Documents → Extract Content → Graphiti Processing → Neo4j Graph
### Important Limitations & Notes

1. **Async-only** – GraphRAGClient uses async/await exclusively
2. **No collection deletion** – Neo4j doesn't support collection-level deletion via Graphiti; use raw Neo4j queries if needed
3. **Episode-based organization** – Graph data is organized as episodes (document sources); explicit management required
4. **LLM cost** – Entity extraction uses LLM calls; consider batching for large documents
5. **Phase 1 feature** – Hybrid retrieval (merging graph + vector results) is planned for Phase 2
2. **Episode-based organization** – Graph data is organized as episodes (document sources); explicit management required
3. **LLM cost** – Entity extraction uses LLM calls; consider batching for large documents
4. **Phase 1 feature** – Hybrid retrieval (merging graph + vector results) is planned for Phase 2

### Custom Group ID for Multi-Tenant & Environment Isolation

Expand Down Expand Up @@ -346,6 +345,78 @@ POST /graph-rag/retrieve
- [GRAPH_RAG_GROUP_ID_GUIDE.md](./GRAPH_RAG_GROUP_ID_GUIDE.md) – Comprehensive guide with examples
- [GRAPH_RAG_INTEGRATION_GUIDE.md](./GRAPH_RAG_INTEGRATION_GUIDE.md) – Multi-tenant implementation patterns

### Graph RAG Delete Operations (NEW)

Delete functionality has been fully implemented for Graph RAG with support for deleting at multiple levels:

#### Deletion Methods

1. **Delete Node (Entity)**
```python
result = await client.delete_node(node_uuid, collection_name)
# Deletes single entity and all connected relationships
```

2. **Delete Edge (Relationship/Fact)**
```python
result = await client.delete_edge(edge_uuid, collection_name)
# Deletes single relationship without affecting entities
```

3. **Delete Episode (Document)**
```python
result = await client.delete_episode(episode_uuid, collection_name)
# Deletes all entities/relationships from document
# Automatically removes orphaned nodes (no remaining connections)
```

4. **Delete Collection**
```python
result = await client.delete_collection(collection_name)
# Deletes ALL data in collection (irreversible)
# Requires explicit confirmation to prevent accidents
```

#### Implementation Details

- **Files Modified:**
- `src/insta_rag/graph_rag/graph_builder.py` – Core deletion logic
- `src/insta_rag/graph_rag/client.py` – Client API wrappers
- `src/insta_rag/graph_rag/neo4j_driver.py` – Driver reference storage
- `testing_api/graph_rag_routes.py` – API endpoints (4 new + 1 demo endpoint)

- **API Endpoints:**
- `POST /graph-rag/delete-node` – Delete entity
- `POST /graph-rag/delete-edge` – Delete relationship
- `POST /graph-rag/delete-episode` – Delete document
- `POST /graph-rag/delete-collection` – Delete collection (requires confirmation)
- `POST /graph-rag/test/demo-delete` – Interactive demo

- **Key Features:**
- Multi-tenant safe via group_id isolation
- Orphan node cleanup automatically after episode deletion
- Error handling with descriptive messages
- Confirmation required for collection deletion
- Full Swagger documentation with examples

#### Response Format

All delete endpoints return:
```json
{
"success": true,
"message": "Deletion status message",
"deleted_items": {
"uuid": "...",
"count": 0
},
"error": null
}
```

**See also:**
- [GRAPH_RAG_DELETE_EPISODES.md](./GRAPH_RAG_DELETE_EPISODES.md) – Comprehensive deletion guide with examples

## Async Processing & Celery (NEW)

### Overview
Expand Down
63 changes: 63 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -431,8 +431,70 @@ See [GRAPH_RAG_GROUP_ID_GUIDE.md](./GRAPH_RAG_GROUP_ID_GUIDE.md) for comprehensi
| `await client.retrieve(query, collection_name, k)` | Search graph using hybrid semantic + BM25 |
| `await client.retrieve_with_reranking(query, collection_name, center_node)` | Retrieve with distance-based reranking from center node |
| `await client.get_entity_context(entity_name, collection_name, depth)` | Get entity and related facts (up to depth levels) |
| `await client.delete_node(node_uuid, collection_name)` | Delete entity node and its connected relationships |
| `await client.delete_edge(edge_uuid, collection_name)` | Delete relationship/fact between entities |
| `await client.delete_episode(episode_uuid, collection_name)` | Delete document and orphaned entities |
| `await client.delete_collection(collection_name)` | Delete entire collection (irreversible) |
| `await client.close()` | Cleanup Neo4j connection |

### Graph RAG Delete Operations (NEW)

Delete data from your knowledge graph at multiple levels:

#### Quick Delete Example

```python
async with GraphRAGClient() as client:
# Delete a single entity node
await client.delete_node(node_uuid, collection_name="company")

# Delete a relationship/fact
await client.delete_edge(edge_uuid, collection_name="company")

# Delete a document and orphaned entities
await client.delete_episode(episode_uuid, collection_name="company")

# Delete entire collection (with confirmation)
await client.delete_collection(collection_name="company")
```

#### Deletion Levels

| Operation | Scope | Use Case |
|-----------|-------|----------|
| `delete_node()` | Single entity + edges | Remove specific entity |
| `delete_edge()` | Single relationship | Remove specific fact |
| `delete_episode()` | Document + orphaned nodes | Remove document and its data |
| `delete_collection()` | All data in collection | Cleanup entire collection |

#### REST API

All delete operations are also available via REST endpoints:

```bash
# Delete node
curl -X POST http://localhost:8000/graph-rag/delete-node \
-H "Content-Type: application/json" \
-d '{"node_uuid": "...", "collection_name": "company"}'

# Delete edge
curl -X POST http://localhost:8000/graph-rag/delete-edge \
-H "Content-Type: application/json" \
-d '{"edge_uuid": "...", "collection_name": "company"}'

# Delete episode
curl -X POST http://localhost:8000/graph-rag/delete-episode \
-H "Content-Type: application/json" \
-d '{"episode_uuid": "...", "collection_name": "company"}'

# Delete collection (requires confirm=true)
curl -X POST http://localhost:8000/graph-rag/delete-collection \
-H "Content-Type: application/json" \
-d '{"collection_name": "company", "confirm": true}'
```

See [GRAPH_RAG_DELETE_EPISODES.md](./GRAPH_RAG_DELETE_EPISODES.md) for comprehensive deletion guide.

### Graph RAG vs Vector RAG

| Aspect | Vector RAG | Graph RAG |
Expand All @@ -443,6 +505,7 @@ See [GRAPH_RAG_GROUP_ID_GUIDE.md](./GRAPH_RAG_GROUP_ID_GUIDE.md) for comprehensi
| **Entity Extraction** | Not explicit | LLM-driven, explicit |
| **Use Cases** | General similarity search | Structured knowledge discovery |
| **Best For** | Content search | Relationship queries |
| **Deletion** | N/A | Full CRUD support |

---

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "insta_rag"
version = "0.1.1-beta.5"
version = "0.1.1-beta.6"
description = "A RAG (Retrieval-Augmented Generation) library for document processing and retrieval."
authors = [
{ name = "Aukik Aurnab", email = "[email protected]" },
Expand Down
140 changes: 139 additions & 1 deletion src/insta_rag/graph_rag/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,13 @@ async def initialize(self) -> "GraphRAGClient":
RuntimeError: If initialization fails
"""
self._graphiti = await self.driver.initialize()
self._builder = GraphBuilder(self._graphiti, self.group_id)
# Pass the Neo4j driver to GraphBuilder for raw query execution (needed for deletion)
neo4j_driver = self.driver.get_neo4j_driver()
self._builder = GraphBuilder(
self._graphiti,
self.group_id,
neo4j_driver=neo4j_driver
)
self._retriever = GraphRetriever(self._graphiti, self.group_id)
return self

Expand Down Expand Up @@ -431,6 +437,138 @@ async def get_entity_context(
depth=depth,
)

# ======================== Delete Operations ========================

async def delete_node(
self,
node_uuid: str,
collection_name: str = "default",
) -> dict:
"""Delete an entity node from the knowledge graph.

Removes the specified entity and all its connected relationships.

Args:
node_uuid: UUID of the entity node to delete
collection_name: Collection context

Returns:
Dict with deletion result:
- success: bool
- node_uuid: The deleted node UUID
- edges_deleted: Number of connected edges removed
- error: Optional error message

Raises:
RuntimeError: If not initialized
ValueError: If node_uuid is invalid
"""
if not self._builder:
raise RuntimeError("Client not initialized. Call initialize() first.")

if not node_uuid or not isinstance(node_uuid, str):
raise ValueError("node_uuid must be a non-empty string")

return await self._builder.delete_node(node_uuid, collection_name)

async def delete_edge(
self,
edge_uuid: str,
collection_name: str = "default",
) -> dict:
"""Delete a relationship (edge) from the knowledge graph.

Removes the specified fact/relationship between entities.
Connected entities are not affected.

Args:
edge_uuid: UUID of the relationship to delete
collection_name: Collection context

Returns:
Dict with deletion result:
- success: bool
- edge_uuid: The deleted edge UUID
- error: Optional error message

Raises:
RuntimeError: If not initialized
ValueError: If edge_uuid is invalid
"""
if not self._builder:
raise RuntimeError("Client not initialized. Call initialize() first.")

if not edge_uuid or not isinstance(edge_uuid, str):
raise ValueError("edge_uuid must be a non-empty string")

return await self._builder.delete_edge(edge_uuid, collection_name)

async def delete_episode(
self,
episode_uuid: str,
collection_name: str = "default",
) -> dict:
"""Delete an entire episode (document) and its extracted data.

Removes all edges belonging to this episode, then deletes any
orphaned nodes (nodes with no remaining connections).

Args:
episode_uuid: UUID of the episode/document to delete
collection_name: Collection context

Returns:
Dict with deletion statistics:
- success: bool
- episode_uuid: The deleted episode UUID
- edges_deleted: Number of edges removed
- orphan_nodes_deleted: Number of orphaned nodes removed
- error: Optional error message

Raises:
RuntimeError: If not initialized
ValueError: If episode_uuid is invalid
"""
if not self._builder:
raise RuntimeError("Client not initialized. Call initialize() first.")

if not episode_uuid or not isinstance(episode_uuid, str):
raise ValueError("episode_uuid must be a non-empty string")

return await self._builder.delete_episode(episode_uuid, collection_name)

async def delete_collection(
self,
collection_name: str,
) -> dict:
"""Delete entire collection with all its data.

⚠️ DESTRUCTIVE OPERATION: Removes all entities and relationships
in the specified collection. This cannot be undone.

Args:
collection_name: Collection to delete

Returns:
Dict with deletion statistics:
- success: bool
- collection_name: Collection that was deleted
- edges_deleted: Number of edges removed
- nodes_deleted: Number of nodes removed
- error: Optional error message

Raises:
RuntimeError: If not initialized
ValueError: If collection_name is invalid
"""
if not self._builder:
raise RuntimeError("Client not initialized. Call initialize() first.")

if not collection_name or not isinstance(collection_name, str):
raise ValueError("collection_name must be a non-empty string")

return await self._builder.delete_collection(collection_name)

# ======================== Context Manager Support ========================

async def __aenter__(self):
Expand Down
Loading
Loading