Skip to content

Conversation

@netbrah
Copy link
Contributor

@netbrah netbrah commented Nov 22, 2025

Description

This pull request adds a full end-to-end integration testing setup for LightRAG.
It introduces a GitHub Actions workflow, a dedicated integration environment, a mock OpenAI-compatible server, and a sample C++ repository that together exercise the LightRAG HTTP API (indexing, querying, and graph retrieval) against realistic data and backend services (Redis, Neo4j, Milvus).

Related Issues

  • N/A — new integration testing infrastructure (no existing issue explicitly tracked).
    (Update this section with issue links if you have a tracking ticket.)

Changes Made

  • GitHub CI / Docs

    • Added .github/workflows/integration-test.yml:
      • Runs on push, pull_request, and workflow_dispatch.
      • Installs lightrag-hku[api] plus core test dependencies.
      • Uses docker compose with tests/docker-compose.integration.yml to bring up Redis, Neo4j, and Milvus.
      • Waits for each service to become healthy via explicit health checks.
      • Starts a mock OpenAI-compatible server and the lightrag-server, then runs tests/integration_test.py.
      • On failure, captures LightRAG logs and Docker logs; always tears down containers and cleans test artifacts.
    • Added .github/INTEGRATION_TEST_SETUP.md documenting how the integration environment works and how to run it.
  • Server / tokenizer support

    • Updated lightrag/api/lightrag_server.py to optionally use a simple offline tokenizer when LIGHTRAG_OFFLINE_TOKENIZER=true:
      • Dynamically loads tests/simple_tokenizer.py (if present) and passes a custom tokenizer into LightRAG.
      • This supports fully offline/integration test scenarios where a production tokenizer may not be available or desired.
  • Integration environment configuration

    • Added tests/.env.integration:
      • Configures LightRAG for integration tests (ports, logging, chunking, concurrency).
      • Points LLM and embedding calls to the local mock OpenAI server.
      • Uses Redis, Neo4j, and Milvus backends with test-only credentials and database names.
    • Added tests/docker-compose.integration.yml:
      • Defines Redis, Neo4j, and a complete Milvus stack (etcd, MinIO, Milvus standalone) with health checks and exposed ports suitable for CI.
  • Integration test harness and fixtures

    • Added tests/integration_test.py:
      • Implements an async IntegrationTestRunner using httpx.AsyncClient.
      • Waits for the LightRAG server’s /health endpoint.
      • Indexes a sample C++ project (tests/sample_cpp_repo), including all *.cpp/*.h files and the README.
      • Exercises /query, /query/data, and /graph/data across multiple modes (naive, local, global, hybrid).
      • Produces a test summary and exits with a non-zero status if any tests fail.
    • Added tests/mock_openai_server.py:
      • FastAPI-based mock implementation of OpenAI’s chat and embeddings APIs.
      • Generates deterministic embeddings and context-aware mock responses, plus a /health endpoint for readiness checks.
    • Added tests/sample_cpp_repo:
      • Small C++ project (calculator.*, utils.*, main.cpp, README.md) used as realistic source material for indexing and code-oriented queries.
    • Added tests/simple_tokenizer.py:
      • Provides a simple, offline-friendly tokenizer that can be wired into LightRAG for integration tests via the new LIGHTRAG_OFFLINE_TOKENIZER flag.
    • Added tests/start_server_offline.py:
      • Helper script to start the LightRAG server in an offline/integration mode, consistent with the new .env.integration settings and Docker services.

Checklist

  • Changes tested locally
    (Run the integration workflow locally or via GitHub Actions and mark this once verified.)
  • Code reviewed
    (To be checked after maintainer / peer review.)
  • Documentation updated (if necessary)
    - .github/INTEGRATION_TEST_SETUP.md added; update any external docs if needed.
  • Unit tests added (if applicable)
    - This PR focuses on integration tests rather than new unit tests.

Additional Notes

  • The integration workflow is relatively heavy (Redis, Neo4j, Milvus, mock LLM server, full LightRAG server). It is best treated as a deep end-to-end check rather than a fast pre-push job.
  • All credentials and configuration values introduced here are for testing only and are scoped to local/CI environments.
  • The mock OpenAI server and simple tokenizer are not intended for production use; they exist solely to support deterministic, offline-friendly integration testing.

@danielaskdd
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdd danielaskdd added the enhancement New feature or request label Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants