Skip to content

Conversation

@donbr
Copy link
Owner

@donbr donbr commented Dec 9, 2025

Summary

Fixes a race condition that causes the Graphiti MCP server to crash on startup when using Neo4j 5.x as the database backend.

Problem Description

Error: Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists

Root Cause: Neo4j 5.x has a known issue where concurrent CREATE INDEX ... IF NOT EXISTS statements can fail even with the guard clause. The upstream graphiti_core library runs ~19 index creation queries in parallel via semaphore_gather(), triggering this race condition.

Impact: MCP server crashes during initialization, making Neo4j 5.x deployments non-functional.

Solution

Wrap build_indices_and_constraints() calls with try/except to catch and gracefully handle the EquivalentSchemaRuleAlreadyExists error. The indexes are successfully created by whichever parallel query completes first, so we can safely continue execution.

Changes

  • src/server.py: Add race condition error handling (production entrypoint)
  • src/graphiti_mcp_server.py: Add race condition error handling (legacy entrypoint)
  • tests/test_neo4j_race_condition.py: Integration tests for error handling
  • CLAUDE.md: Document the warning message in troubleshooting section

Testing

Before Fix:

$ docker compose -f docker/docker-compose-neo4j.yml up -d
# Container exits with code 1
neo4j.exceptions.ClientError: {neo4j_code: Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists}

After Fix:

$ docker compose -f docker/docker-compose-neo4j.yml up -d
$ docker logs docker-graphiti-mcp-1
# Output:
2025-11-24 - graphiti_mcp_server - WARNING - Index creation race condition detected (Neo4j 5.x issue). Indexes likely already exist. Continuing...
2025-11-24 - graphiti_mcp_server - INFO - Successfully initialized Graphiti client

$ curl http://localhost:8000/health
{"status":"healthy","service":"graphiti-mcp"}

Integration Tests:

  • ✅ Race condition error is caught and logged as warning
  • ✅ Other errors are properly re-raised
  • ✅ Server initializes successfully after handling race condition
  • ✅ All existing tests pass

References

Checklist

  • Fix handles the specific EquivalentSchemaRuleAlreadyExists error
  • Other exceptions are re-raised (not silently swallowed)
  • Warning message is logged for visibility
  • Changes applied to both server entrypoints (factory + legacy)
  • Integration tests added
  • Documentation updated
  • Manual testing with Neo4j 5.26.0 successful
  • No impact on FalkorDB backend

🤖 Generated with Claude Code

## Summary
- Wrap build_indices_and_constraints() with try/except in both entrypoints
- Catch EquivalentSchemaRuleAlreadyExists error (Neo4j 5.x parallel index creation race condition)
- Log warning and continue (indexes are created successfully despite error)
- Add integration tests to verify error handling
- Update CLAUDE.md troubleshooting documentation

## Changes
- src/server.py: Add race condition error handling
- src/graphiti_mcp_server.py: Add race condition error handling (legacy entrypoint)
- tests/test_neo4j_race_condition.py: Integration tests for error handling
- CLAUDE.md: Document the warning message in troubleshooting section

## Technical Details
Neo4j 5.x has a known issue where concurrent CREATE INDEX ... IF NOT EXISTS
statements can fail with EquivalentSchemaRuleAlreadyExists even though the
guard clause should prevent this. The upstream graphiti_core library runs
~19 index creation queries in parallel via semaphore_gather(), triggering
this race condition.

The fix catches this specific error and continues execution, as the indexes
are successfully created by whichever parallel query completes first.

## Testing
- Integration tests verify graceful handling of race condition error
- Integration tests verify other errors are properly re-raised
- Manual testing with Neo4j 5.26.0 confirms server starts successfully
- All existing tests pass

## References
- Upstream PR: getzep/graphiti#1081 (comment)
- Neo4j Issue: neo4j/neo4j#13208

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@claude
Copy link

claude bot commented Dec 9, 2025

Code Review: Neo4j 5.x Race Condition Fix

✅ Overall Assessment

This is a well-executed fix for a real production issue. The PR demonstrates good practices in error handling, testing, and documentation. The solution is pragmatic and appropriate for handling an upstream race condition.


Detailed Review

🎯 Code Quality: Excellent

Strengths:

  • ✅ Consistent implementation across both entrypoints (src/server.py and src/graphiti_mcp_server.py)
  • ✅ Clear, descriptive error messages with references to upstream issues
  • ✅ Proper error propagation (only catches the specific race condition error)
  • ✅ Minimal code changes focused on the problem
  • ✅ Follows existing code style and conventions

Error Handling Pattern (src/server.py:157-168):

try:
    await self.client.build_indices_and_constraints()
except Exception as idx_error:
    if 'EquivalentSchemaRuleAlreadyExists' in str(idx_error):
        logger.warning(...)
    else:
        raise

This is the right approach - catch broadly, filter specifically, re-raise everything else.


🐛 Potential Issues: Minor Improvements Suggested

1. String-based Exception Detection

Issue: Using 'EquivalentSchemaRuleAlreadyExists' in str(idx_error) is fragile if the error message format changes.

Recommendation: Consider checking for the actual Neo4j exception type:

# More robust approach
try:
    from neo4j.exceptions import ClientError
    if isinstance(idx_error, ClientError) and 'EquivalentSchemaRuleAlreadyExists' in str(idx_error.code):
        # handle race condition
except ImportError:
    # Fallback to string checking if neo4j not installed
    if 'EquivalentSchemaRuleAlreadyExists' in str(idx_error):
        # handle race condition

However, given that this is an upstream Graphiti issue and the error comes wrapped through multiple layers, the current string-based approach is acceptable and pragmatic. Just document this tradeoff.

2. Test Specificity

Issue: Tests use generic Exception instead of actual Neo4j exception types.

Current (test_neo4j_race_condition.py:31):

mock_error = Exception('Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists')

Suggestion: Import and use real Neo4j exceptions for more realistic testing:

try:
    from neo4j.exceptions import ClientError
    mock_error = ClientError.hydrate(
        code='Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists',
        message='...'
    )
except ImportError:
    mock_error = Exception('Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists')

Note: This is a nice-to-have, not a blocker. Current tests are sufficient.


Performance Considerations: No Concerns

  • Exception handling only occurs during initialization (one-time cost)
  • No impact on runtime performance after startup
  • No additional overhead in happy path
  • FalkorDB backend unaffected

🔒 Security Concerns: None

  • No security vulnerabilities introduced
  • Error messages don't leak sensitive information
  • Proper error propagation maintains security boundaries

🧪 Test Coverage: Good

Coverage:

  • ✅ Test for successful race condition handling
  • ✅ Test for non-race-condition errors being re-raised
  • ✅ Test for proper warning log output
  • ✅ Tests use proper mocking to avoid real database dependencies

Test Marker Usage:

@pytest.mark.integration
@pytest.mark.requires_neo4j

Appropriate markers for test categorization.

Minor suggestion: Consider adding a test that verifies the service continues to function normally after handling the race condition (e.g., verify a subsequent operation succeeds).


📚 Documentation: Excellent

CLAUDE.md update (lines 599-609):

  • ✅ Clear explanation of the warning message
  • ✅ Links to upstream issues (Neo4j and Graphiti)
  • ✅ Reassures users it's harmless
  • ✅ Well-placed in the Troubleshooting section

🎯 Best Practices Alignment

Checking against CLAUDE.md guidelines:

Practice Status
Single quotes for strings ✅ Pass
100-char line length ✅ Pass
Proper error handling ✅ Pass
Async/await patterns ✅ Pass
Type hints ⚠️ Could add type hints to exception handlers
Google-style docstrings ✅ Pass (tests have clear docstrings)

📋 Recommendations

Priority: Optional Improvements

None of these block the PR - they're enhancements for future consideration:

  1. Type Hints: Add type hints to exception variables

    except Exception as idx_error:  # Could be: except Exception as idx_error: ClientError | Exception
  2. Exception Specificity: Document the tradeoff of string-based detection in a code comment

  3. Test Enhancement: Add a test verifying the service remains functional after handling the race condition

  4. Logging Enhancement: Consider adding debug-level logging of the full exception for troubleshooting:

    logger.warning('Index creation race condition detected...')
    logger.debug(f'Full error: {idx_error}', exc_info=True)

Approval Decision

APPROVED - Ready to Merge

This PR:

  • ✅ Solves a critical production bug
  • ✅ Includes comprehensive tests
  • ✅ Has excellent documentation
  • ✅ Follows project conventions
  • ✅ Has no security concerns
  • ✅ Minimal, focused changes

The minor suggestions above are optional enhancements that don't block merging.


🎉 Positive Highlights

  1. Thoroughness: Both entrypoints updated consistently
  2. Testing: Three well-designed tests covering different scenarios
  3. Documentation: Clear troubleshooting section added
  4. Pragmatism: Solves the immediate problem without over-engineering
  5. Communication: Excellent PR description with before/after examples

Great work on this fix! 🚀

@donbr donbr merged commit 62768ca into main Dec 9, 2025
1 check passed
@donbr donbr deleted the fix/neo4j-race-condition branch December 10, 2025 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants