Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
e0e6e60
feat(api): 添加 Dify 外部知识库适配器
ericshao Mar 10, 2025
95acb22
Merge branch 'main' of github.com:ericshao/LightRAG into feat-dify
ericshao Mar 10, 2025
802a077
fix: 更新缓存清理方法并修改前端资源链接
ericshao Mar 10, 2025
23d5866
feat(entity): 调整实体提取参数和日志
ericshao Mar 11, 2025
d0eba7c
Merge branch 'main' of github.com:ericshao/LightRAG into feat-dify
ericshao Mar 11, 2025
9c0fcc1
Merge pull request #1 from ericshao/feat-dify
ericshao Mar 11, 2025
3f45c53
refactor(kg): 重构知识图谱相关代码
ericshao Mar 12, 2025
9a93f73
Merge branch 'main' of github.com:ericshao/LightRAG into dev
ericshao Mar 13, 2025
99c7e13
Merge branch 'main' of github.com:ericshao/LightRAG into dev
ericshao Mar 14, 2025
b323547
refactor(kg): 移除 Neo4JStorage 中的标签缓存相关代码
ericshao Mar 14, 2025
bf050fd
feat(kg): 为 Neo4j 存储实现添加缓存功能
ericshao Mar 15, 2025
4f20cd8
Merge pull request #2 from ericshao/feat-neo4j_cache
ericshao Mar 15, 2025
12d8760
refactor(prompt): 更新实体类型列表
ericshao Mar 15, 2025
16b8383
refactor(lightrag): 优化实体关系管理功能
ericshao Mar 17, 2025
57f5c76
build(Dockerfile): 使用阿里云镜像源代理
ericshao Mar 17, 2025
cebf264
perf(kg): Optimizing Neo4j Implementations: Controlling concurrent up…
ericshao Mar 17, 2025
1642ef0
Merge branch 'main' of github.com:ericshao/LightRAG into dev
ericshao Mar 17, 2025
aaf6c1a
Merge branch 'ref-max_concurrent_graph_db_updates' of github.com:eric…
ericshao Mar 17, 2025
34eb09a
build(Dockerfile): 更新基础镜像并配置 APT 源
ericshao Mar 17, 2025
0d86fa2
Merge branch 'main' of github.com:ericshao/LightRAG into dev
ericshao Mar 18, 2025
aa1a00e
refactor(lightrag): 优化知识图谱数据处理和 LLM 调用逻辑
ericshao Mar 18, 2025
647b918
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Mar 20, 2025
39d9b9d
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Mar 20, 2025
85b8ab3
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Mar 20, 2025
187d3d6
refactor(lightrag): 优化数据处理和查询
ericshao Mar 20, 2025
bc21305
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Mar 23, 2025
02deb62
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Mar 31, 2025
8f4510f
fix(lightrag): 修复文本单元数据中缺少文件路径的问题
ericshao Mar 31, 2025
2827a63
refactor(webui): 重构文档管理页面
ericshao Mar 31, 2025
3ef11fc
feat(api): 为知识图谱 API 添加认证依赖
ericshao Mar 31, 2025
4da5060
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Apr 1, 2025
83cd5f7
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Apr 4, 2025
11321fb
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Apr 5, 2025
1cfb290
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Apr 6, 2025
79b6a94
Merge branch 'main' of github.com:ericshao/LightRAG into merge-main
ericshao Apr 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ gui/

# unit-test files
test_*
rag_storage*/
visual/knowledge_graph.html

# Cline files
memory-bank/
11 changes: 9 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
# Build stage
FROM python:3.11-slim AS builder
FROM func.ink/python:3.11.7-slim-bookworm AS builder

WORKDIR /app

RUN test -e /etc/apt/sources.list || echo "deb http://mirrors.aliyun.com/debian bookworm main" > /etc/apt/sources.list && \
echo "deb http://mirrors.aliyun.com/debian-security bookworm-security main" >> /etc/apt/sources.list && \
echo "deb http://mirrors.aliyun.com/debian bookworm-updates main" >> /etc/apt/sources.list

ENV RUSTUP_DIST_SERVER=https://mirrors.aliyun.com/rustup
ENV RUSTUP_UPDATE_ROOT=https://mirrors.aliyun.com/rustup/rustup
# Install Rust and required build dependencies
RUN apt-get update && apt-get install -y \
curl \
Expand All @@ -18,11 +24,12 @@ COPY lightrag/api/requirements.txt ./lightrag/api/

# Install dependencies
ENV PATH="/root/.cargo/bin:${PATH}"
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install --user --no-cache-dir -r requirements.txt
RUN pip install --user --no-cache-dir -r lightrag/api/requirements.txt

# Final stage
FROM python:3.11-slim
FROM func.ink/python:3.11.7-slim-bookworm

WORKDIR /app

Expand Down
98 changes: 98 additions & 0 deletions docs/DifyExternalKnowlegeAPI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
GUIDES > KNOWLEDGE

External Knowledge API Editor: Allen. Dify Technical Writer

# Endpoint

POST <your-endpoint>/retrieval

## Header

This APl is used to connect to a knowledge base that is independent of the Dify and maintained by developers. For more details, please refer to Connecting to an External Knowledge Base. You can use API-Key in the Authorization HTTP Header to verify permissions. The authentication logic is defined by you in the retrieval API, as shown below:

Authorization: Bearer {API_KEY}

## Request Body Elements

The request accepts the following data in JSON format.

The retrieval_setting property is an object containing the following keys:


<html><body><table><tr><td>Property</td><td>Required</td><td>Type</td><td>Description</td><td>Example value</td></tr><tr><td>knowledge_id</td><td>TRUE</td><td>string</td><td>Your knowledge's unique ID</td><td>AAA-BBB-CCC</td></tr><tr><td>query</td><td>TRUE</td><td>string</td><td>User's query</td><td>What is Dify?</td></tr><tr><td>retrieval_setting</td><td>TRUE</td><td>object</td><td>Knowledge's retrieval parameters</td><td>See below</td></tr></table></body></html>

<html><body><table><tr><td>Property</td><td>Required</td><td>Type</td><td>Description</td><td>Example value</td></tr><tr><td>top_k</td><td>TRUE</td><td>int</td><td>Maximum number ofretrievedresults</td><td>5</td></tr><tr><td>score_threshold</td><td>TRUE</td><td>float</td><td>The score limit of relevance of the result to the query scope:0~1</td><td>0.5</td></tr></table></body></html>

## Request Syntax

POST <your-endpoint>/retrieval HTTP/1.1
-- header
Content-Type: application/json
Authorization: Bearer your-api-key
-- data
{
"knowledge_id": "your-knowledge-id",
"query": "your question",
"retrieval_setting":{
"top_k": 2,
"score_threshold": 0.5
}
}

## Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

The records property is a list object containing the following keys:


<html><body><table><tr><td>Property</td><td>Required</td><td>Type</td><td>Description</td><td>Example value</td></tr><tr><td>records</td><td>TRUE</td><td>List[Object]</td><td>A list ofrecords from querying the knowledge base.</td><td>See below</td></tr></table></body></html>

<html><body><table><tr><td>Property</td><td>Required</td><td>Type</td><td>Description</td><td>Example value</td></tr><tr><td>content</td><td>TRUE</td><td>string</td><td>Contains a chunk of text from a data source in the knowledge base.</td><td>Dify:The Innovation Engine for GenAl Applications</td></tr><tr><td>score</td><td>TRUE</td><td>float</td><td>The score of relevance of the result to the query. scope:0~1</td><td>0.5</td></tr><tr><td>title</td><td>TRUE</td><td>string</td><td>Document title</td><td>Dify Introduction</td></tr><tr><td>metadata</td><td>FALSE</td><td>json</td><td>Contains metadata attributes and their values for the document in the data source.</td><td>See example</td></tr></table></body></html>

## Response Syntax


HTTP/1.1 200
Content-type: application/json
{
"records": [{
"metadata": {
"path": "s3://dify/knowledge.txt",
"description": "dify knowledge document"
},
"score": 0.98,
"title": "knowledge.txt",
"content": "This is the document for external knowledge."
},
{
"metadata": {
"path": "s3://dify/introduce.txt",
"description": "dify introduce"
},
"score": 0.66,
"title": "introduce.txt",
"content": "The Innovation Engine for GenAI Applications"
}
]
}


## Errors

If the action fails, the service sends back the following error information in JSON format:

The error_code property has the following types:


<html><body><table><tr><td>Property</td><td>Required</td><td>Type</td><td>Description</td><td>Examplevalue</td></tr><tr><td>error_code</td><td>TRUE</td><td>int</td><td>Error code</td><td>1001</td></tr><tr><td>error_msg</td><td>TRUE</td><td>string</td><td>The description of API exception</td><td>Invalid Authorization header format. Expected'Bearer format.</td></tr></table></body></html>

<html><body><table><tr><td>Code Description</td></tr><tr><td>1001</td><td>InvalidAuthorizationheaderformat.</td></tr><tr><td>1002</td><td>Authorizationfailed</td></tr><tr><td>2001</td><td>The knowledge doesnot exist</td></tr></table></body></html>

### HTTP Status Codes

AccessDeniedException The request is denied because of missing access permissions. Check your permissions and retry your request. HTTP Status Code: 403

InternalServerException An internal server error occurred. Retry your request. HTTP Status Code: 500
118 changes: 118 additions & 0 deletions docs/DocumentURLAccess.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Document URL Access Design

## Overview

Add document URL access capability to LightRAG API to enable direct access to source documents from search results.

## Design

### 1. Document Access Endpoint

Add a new endpoint to retrieve document content:

```python
@router.get("/{doc_id}")
async def get_document_content(
doc_id: str,
rag: LightRAG = Depends(get_rag)
) -> dict:
"""Get the content of a specific document

Args:
doc_id: Document ID to retrieve
rag: LightRAG instance

Returns:
Dict containing document content and metadata
"""
doc = await rag.full_docs.get_by_id(doc_id)
if not doc:
raise HTTPException(status_code=404, detail="Document not found")
return {
"content": doc["content"],
"metadata": doc.get("metadata", {})
}
```

### 2. Response Schema Updates

Extend the DocStatusResponse model:

```python
class DocStatusResponse(BaseModel):
id: str
content_summary: str
content_length: int
status: DocStatus
created_at: str
updated_at: str
chunks_count: Optional[int] = None
error: Optional[str] = None
metadata: Optional[dict[str, Any]] = None
url: str = Field(description="URL to access the document content")

def __init__(self, **data):
super().__init__(**data)
# Generate document URL
self.url = f"/api/documents/{self.id}"
```

### 3. Query Response Format

Search results should include document URLs in the metadata:

```json
{
"response": "Answer text...",
"sources": [
{
"content": "Matching text chunk...",
"document": {
"id": "doc-f7a92c",
"url": "/api/documents/doc-f7a92c",
"summary": "Document summary..."
}
}
]
}
```

### 4. Implementation Steps

1. Add get_document_content endpoint in document_routes.py
2. Update DocStatusResponse model to include URL generation
3. Modify query result processing to include document metadata
4. Add URL field to document schemas
5. Update API documentation

### 5. Security Considerations

1. Rate Limiting:
- Apply standard API rate limits to document access
- Consider caching for frequently accessed docs

2. Access Control:
- Use same authentication as other endpoints
- Validate document access permissions
- Log document access attempts

3. Error Handling:
- Return 404 for non-existent documents
- Return 403 for unauthorized access
- Handle missing content gracefully

### 6. Testing

1. Unit Tests:
- Test URL generation
- Verify document retrieval
- Check error handling

2. Integration Tests:
- Test document flow from insertion to retrieval
- Verify URL access in query results
- Test rate limiting and caching

3. Load Tests:
- Verify performance with concurrent access
- Test caching effectiveness
10 changes: 10 additions & 0 deletions lightrag/api/lightrag_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@
create_document_routes,
run_scanning_process,
)

from lightrag.api.routers.dify_routes import create_dify_routes
from lightrag.api.routers.kg_routes import create_kg_routes

from lightrag.api.routers.query_routes import create_query_routes
from lightrag.api.routers.graph_routes import create_graph_routes
from lightrag.api.routers.ollama_api import OllamaAPI
Expand Down Expand Up @@ -364,6 +368,12 @@ async def azure_openai_model_complete(
app.include_router(create_document_routes(rag, doc_manager, api_key))
app.include_router(create_query_routes(rag, api_key, args.top_k))
app.include_router(create_graph_routes(rag, api_key))
app.include_router(create_kg_routes(rag, api_key))

# Add Dify External Knowledge API routes if enabled
if os.getenv("ENABLE_DIFY_ADAPTER", "False").lower() == "true":
ASCIIColors.info("Enabling Dify External Knowledge API adapter")
app.include_router(create_dify_routes(rag, api_key))

# Add Ollama API routes
ollama_api = OllamaAPI(rag, top_k=args.top_k, api_key=api_key)
Expand Down
Loading