Skip to content

Conversation

@NShumway
Copy link

Enhanced the edge deduplication prompts to better recognize semantically equivalent facts that use different phrasings:

  • Self-referential relationships ("X is a sub-agency of X" = "X is its own sub-agency")
  • Active vs passive voice ("A awarded contract to B" = "B received contract from A")
  • Numeric format equivalence ($1M = $1,000,000)
  • Entity aliases (DoD = Department of Defense)

Added integration tests that verify the LLM correctly identifies semantic duplicates with the improved prompts.

Summary

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Performance improvement
  • Documentation/Tests

Objective

For new features and performance improvements: Clearly describe the objective and rationale for this change.

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • All existing tests pass

Breaking Changes

  • This PR contains breaking changes

If this is a breaking change, describe:
We should verify that the changed prompt does not flag any non-duplicates as duplicates - would be good for someone else to run with this prompt on their own data before we merge.

Checklist

  • Code follows project style guidelines (make lint passes)
  • Self-review completed
  • Documentation updated where necessary
  • No secrets or sensitive information committed

Related Issues

Closes #[1101]

Enhanced the edge deduplication prompts to better recognize semantically
equivalent facts that use different phrasings:

- Self-referential relationships ("X is a sub-agency of X" = "X is its own sub-agency")
- Active vs passive voice ("A awarded contract to B" = "B received contract from A")
- Numeric format equivalence ($1M = $1,000,000)
- Entity aliases (DoD = Department of Defense)

Added integration tests that verify the LLM correctly identifies semantic
duplicates with the improved prompts.
@danielchalef
Copy link
Member

danielchalef commented Dec 10, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@NShumway
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

danielchalef added a commit that referenced this pull request Dec 10, 2025
@NShumway
Copy link
Author

Link to issue: #1101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants