Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add fast sync check #460

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

feat: add fast sync check #460

wants to merge 37 commits into from

Conversation

jamescalam
Copy link
Member

@jamescalam jamescalam commented Nov 8, 2024

PR Type

enhancement


Description

  • Introduced new methods _read_hash and _write_config in BaseIndex and PineconeIndex to handle configuration parameters.
  • Enhanced synchronization logic in semantic_router/layer.py by adding hash comparison to check if local and remote instances are synchronized.
  • Added ConfigParameter class in semantic_router/schema.py to manage configuration parameters and convert them to Pinecone format.
  • Implemented validation to prevent the use of reserved namespace 'sr_config' in PineconeIndex.

Changes walkthrough 📝

Relevant files
Enhancement
base.py
Add methods for reading and writing index configuration   

semantic_router/index/base.py

  • Added _read_hash method to read the hash of the index.
  • Added _write_config method to write a config parameter to the index.
  • +18/-0   
    pinecone.py
    Implement configuration methods and namespace validation for Pinecone
    index

    semantic_router/index/pinecone.py

  • Added _read_hash and _write_config methods for Pinecone index.
  • Modified PineconeRecord to have a default function schema.
  • Added validation for reserved namespace.
  • +31/-2   
    layer.py
    Enhance synchronization logic with hash comparison             

    semantic_router/layer.py

  • Added get_hash method to generate a hash for the layer.
  • Updated is_synced method to use hash comparison for synchronization
    check.
  • +29/-4   
    schema.py
    Add ConfigParameter class for configuration management     

    semantic_router/schema.py

  • Introduced ConfigParameter class for configuration management.
  • Added to_pinecone method to convert config to Pinecone format.
  • +19/-1   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    github-actions bot commented Nov 8, 2024

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Possible Bug
    The method _write_config in PineconeIndex uses a to_pinecone method which seems to hardcode the vector values. This could lead to incorrect data being written to the index.

    Redundant Logic
    The method is_synced in RouteLayer contains commented-out code and a TODO comment indicating uncertainty about the necessity of existing logic. This needs clarification or cleanup.

    Copy link

    github-actions bot commented Nov 8, 2024

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Add exception handling for data fetching and upserting operations

    Handle potential exceptions when fetching or upserting data in _read_hash and
    _write_config methods to improve robustness and error handling.

    semantic_router/index/pinecone.py [681-684]

    -hash_record = self.index.fetch(
    -    ids=[f"sr_hash#{self.namespace}"],
    -    namespace="sr_config",
    -)
    +try:
    +    hash_record = self.index.fetch(
    +        ids=[f"sr_hash#{self.namespace}"],
    +        namespace="sr_config",
    +    )
    +except Exception as e:
    +    logger.error(f"Failed to fetch hash record: {e}")
    +    raise
    Suggestion importance[1-10]: 8

    Why: Adding exception handling for data operations significantly improves the robustness and reliability of the system by ensuring that errors are properly managed and logged.

    8
    Validate config parameters before upserting to ensure data integrity

    Validate the config parameter in _write_config method to ensure it contains all
    necessary fields before attempting to upsert to the index.

    semantic_router/index/pinecone.py [701-704]

    +if not all([hasattr(config, attr) for attr in ['field', 'value', 'namespace']]):
    +    raise ValueError("Config parameter is missing required fields")
     self.index.upsert(
         vectors=[config.to_pinecone(dimensions=self.dimensions)],
         namespace="sr_config",
     )
    Suggestion importance[1-10]: 7

    Why: Validating the 'config' parameter before upserting is essential to ensure that all necessary fields are present, which prevents data integrity issues in the system.

    7
    Enhancement
    Replace hardcoded namespace with a configurable namespace attribute

    Ensure that the namespace parameter in _read_hash and _write_config methods is not
    hardcoded to "sr_config" to maintain flexibility and avoid potential conflicts with
    other namespaces.

    semantic_router/index/pinecone.py [682]

    -namespace="sr_config",
    +namespace=self.config_namespace,
    Suggestion importance[1-10]: 7

    Why: Using a configurable namespace attribute instead of a hardcoded value enhances flexibility and reduces the risk of namespace conflicts, which is crucial for a scalable system.

    7
    Performance
    Optimize data access in _read_hash by directly using metadata

    Optimize the _read_hash method by directly accessing the required metadata fields
    instead of fetching the entire vector data, which can be more efficient and clear.

    semantic_router/index/pinecone.py [685-691]

    -if hash_record["vectors"]:
    +if hash_record["metadata"]:
         return ConfigParameter(
             field="sr_hash",
    -        value=hash_record["vectors"]["sr_hash"]["metadata"]["value"],
    -        created_at=hash_record["vectors"]["sr_hash"]["metadata"]["created_at"],
    +        value=hash_record["metadata"]["value"],
    +        created_at=hash_record["metadata"]["created_at"],
             namespace=self.namespace,
         )
    Suggestion importance[1-10]: 5

    Why: The suggestion to optimize data access by directly using metadata could improve performance, but the existing code does not show that 'metadata' is directly accessible without 'vectors', which may limit the feasibility of this suggestion.

    5

    Copy link

    codecov bot commented Nov 8, 2024

    Codecov Report

    Attention: Patch coverage is 69.01408% with 44 lines in your changes missing coverage. Please review.

    Project coverage is 68.50%. Comparing base (5603bac) to head (ca34d21).

    Files with missing lines Patch % Lines
    semantic_router/layer.py 68.42% 18 Missing ⚠️
    semantic_router/index/base.py 58.53% 17 Missing ⚠️
    semantic_router/index/pinecone.py 75.00% 5 Missing ⚠️
    semantic_router/index/postgres.py 0.00% 4 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##             main     #460      +/-   ##
    ==========================================
    + Coverage   68.04%   68.50%   +0.45%     
    ==========================================
      Files          46       46              
      Lines        3505     3591      +86     
    ==========================================
    + Hits         2385     2460      +75     
    - Misses       1120     1131      +11     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant