Skip to content

fix: canonicalize legacy cli crawler command surface#23

Merged
nsalvacao merged 2 commits intocandidate/lote2-audit-2026-03-16from
worker/lote2-aud007-command-surface
Mar 16, 2026
Merged

fix: canonicalize legacy cli crawler command surface#23
nsalvacao merged 2 commits intocandidate/lote2-audit-2026-03-16from
worker/lote2-aud007-command-surface

Conversation

@nsalvacao
Copy link
Owner

Summary\n- convert legacy cli_crawler.py into a thin compatibility shim delegating to canonical cli-crawler behavior\n- normalize --include-raw to --raw with explicit deprecation warnings and deterministic precedence\n- add unit tests for legacy raw flag compatibility and canonical command nudge\n\nValidation\n- ruff check cli_crawler.py src/crawler/cli_crawler.py tests/unit/test_cli_crawler_compat.py\n- PYTHONPATH=src pytest tests/unit/test_cli_crawler_compat.py -v

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@github-actions
Copy link

🔍 AI Code Review

🔵 Info

  • The PR removes the entire CLI argument parsing and crawling logic from cli_crawler.py and replaces it with a minimal legacy compatibility stub that just imports and calls main() from crawler.cli_crawler.
  • The actual CLI logic is moved into src/crawler/cli_crawler.py with added support for normalizing legacy CLI flags (--include-raw--raw) and emitting warnings about deprecated flags.
  • The new code improves maintainability by centralizing CLI logic and providing backward compatibility wrappers.

🟡 Warning

  • The legacy compatibility wrapper emits warnings about deprecated flags but does not show how these warnings are surfaced to the user (e.g., printed or logged). Ensure these warnings are visible to avoid silent behavior changes.
  • The legacy entrypoint warning function _legacy_entrypoint_warning is incomplete in the diff; verify it properly detects legacy invocation and informs users to switch to the canonical command.

🔴 Critical

  • The PR removes the original CLI argument parsing and crawling logic from cli_crawler.py without showing the new main() implementation in src/crawler/cli_crawler.py. Confirm that the new main() fully replicates the original functionality, including error handling, logging configuration, and argument validation, to avoid regressions.
  • The normalization of legacy flags silently converts --include-raw to --raw. If downstream code treats these flags differently, this could cause unexpected behavior. Confirm that all consumers expect --raw only.
  • The PR does not show any tests for the legacy flag normalization or the new entrypoint behavior. Adding tests is recommended to prevent regressions and ensure backward compatibility.

Summary:
The PR refactors CLI entrypoints for better legacy support and maintainability but must ensure full feature parity, proper warning visibility, and thorough testing to avoid breaking existing workflows.


🤖 gpt-4.1-mini · 1955 tokens · GitHub Models free tier · 0 premium requests

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the CLI crawler's command-line interface to establish a canonical entry point and argument handling. It centralizes the core logic within the src/crawler module, transforming the original cli_crawler.py script into a simple compatibility layer that delegates to the new canonical implementation. This change ensures consistent behavior, provides clear guidance for users transitioning from legacy commands, and improves maintainability by consolidating argument parsing and processing.

Highlights

  • Legacy Script Refactoring: Converted the standalone cli_crawler.py script into a lightweight compatibility shim that delegates to the canonical cli-crawler implementation.
  • Argument Normalization: Implemented normalization for the --include-raw command-line argument, mapping it to the canonical --raw flag with appropriate deprecation warnings and deterministic precedence.
  • Unit Test Coverage: Added unit tests to ensure correct handling of legacy raw flag compatibility and to verify the canonical command nudge and warning messages.
Changelog
  • cli_crawler.py
    • Transformed into a thin compatibility shim.
    • Removed extensive argparse logic and direct crawling calls.
    • Updated docstring to reflect its new role as a legacy entrypoint.
    • Delegates execution to the main function in src/crawler.cli_crawler.
  • src/crawler/cli_crawler.py
    • Introduced a new main function to serve as the canonical entrypoint.
    • Implemented argument normalization, specifically converting --include-raw to --raw.
    • Added logic to issue deprecation warnings for legacy flag usage and entrypoint invocation.
    • Ensured correct precedence when both --raw and --include-raw are present.
    • Moved pathlib.Path import to the module level.
  • tests/unit/test_cli_crawler_compat.py
    • Added a new test file for cli_crawler compatibility.
    • Included tests to verify the correct mapping of --include-raw to --raw.
    • Added tests to confirm the precedence of --raw when both raw flags are provided.
    • Verified that appropriate deprecation warnings are displayed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a solid refactoring that converts the legacy cli_crawler.py into a compatibility shim, delegating to the canonical cli-crawler behavior. The normalization of --include-raw to --raw with deprecation warnings is a good improvement for user experience.

I've found a bug in the argument normalization logic that occurs when --include-raw is used before --raw, causing duplicate arguments. I've provided a fix for this. Additionally, I've suggested adding a new unit test to cover this specific scenario and prevent future regressions.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@github-actions
Copy link

🔍 AI Code Review

🔵 CI Workflow (.github/workflows/ci.yml)

  • Added candidate/** branches to trigger CI on push and PR events — good for testing feature branches.
  • Added package-release-gates job:
    • Builds sdist and wheel with fixed SOURCE_DATE_EPOCH for reproducibility.
    • Validates exactly one sdist and one wheel artifact exist.
    • Installs built wheel in a fresh venv and runs smoke import and CLI help commands.

🟡 Suggestion: Consider caching pip packages and build dependencies to speed up CI.

🟡 Suggestion: Add explicit error handling or output on build failures for easier debugging.


🔴 cli_crawler.py

  • The entire original CLI crawler script was replaced with a minimal legacy compatibility stub that imports and calls main() from crawler.cli_crawler.
  • The original CLI parsing, logging, and crawling logic was removed.
  • The new script only prints a short docstring and calls the canonical CLI crawler entrypoint.

🟡 Warning: Removing the CLI argument parsing and logic from this script means any users relying on this script directly will lose functionality unless they switch to the canonical CLI command.

🟡 Best practice: If this script is kept for backward compatibility, consider adding a warning message to users about the deprecation or redirecting them explicitly.


Overall:

  • CI changes are solid and improve release artifact validation.
  • The CLI script refactor simplifies maintenance but risks breaking backward compatibility silently.
  • No obvious security issues detected.
  • No bugs found in CI steps; the CLI script removal of logic is intentional but should be communicated.

Summary:

Issue Severity Recommendation
Removed CLI parsing logic in cli_crawler.py 🟡 Warning Add deprecation notice or redirect users explicitly to new CLI command
CI build caching 🟡 Warning Add caching for pip and build dependencies to speed up CI
CI error handling 🟡 Warning Add more verbose error output on build failures

No critical security or functional bugs found.


🤖 gpt-4.1-mini · 2099 tokens · GitHub Models free tier · 0 premium requests

@nsalvacao nsalvacao merged commit 1e085ab into candidate/lote2-audit-2026-03-16 Mar 16, 2026
6 checks passed
@nsalvacao nsalvacao deleted the worker/lote2-aud007-command-surface branch March 16, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant