Skip to content

Conversation

rucpande
Copy link

@rucpande rucpande commented Oct 1, 2025

Description

Add support for Cisco AI Defense Security, Privacy, and Safety guardrails as a third party API.

Features

  • Input Protection: Inspect user prompts before processing
  • Output Protection: Inspect bot responses before delivery
  • Configuration: Environment-based API configuration
  • Rails Exceptions: Support for enable_rails_exceptions mode
  • Logging: Logging for debugging and monitoring

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @cparisien thanks for reviewing!

- Add AI Defense action for input/output protection
- Add documentation for setup and configuration
- Support for environment-based API key configuration

Fixes NVIDIA-NeMo#1420
Copy link
Contributor

github-actions bot commented Oct 1, 2025

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1433

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi requested a review from Copilot October 2, 2025 13:44
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Cisco AI Defense integration to NeMo Guardrails, providing security guardrails for input and output protection. The integration enables inspection of user prompts and bot responses through Cisco's AI Defense API to detect and block potentially harmful content.

Key changes:

  • Implementation of AI Defense inspection actions and flows for input/output protection
  • Configuration support through environment variables (API key and endpoint)
  • Comprehensive test coverage including unit, integration, and error handling tests

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
nemoguardrails/library/ai_defense/actions.py Core AI Defense inspection action with HTTP client for API calls
nemoguardrails/library/ai_defense/flows.v1.co Colang v1.0 flow definitions for input/output protection
nemoguardrails/library/ai_defense/flows.co Colang v2.0 flow definitions for input/output protection
tests/test_ai_defense.py Comprehensive test suite covering unit, integration, and error scenarios
docs/user-guides/community/ai-defense.md User documentation for setup and usage
docs/user-guides/guardrails-library.md Integration into main guardrails library documentation
examples/configs/ai_defense/config.yml Example configuration file
examples/configs/ai_defense/README.md Example documentation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

- Remove placeholder comment in test_real_api_call_with_safe_output
- Remove debug print statements from test code
- Fix incorrect docstring in ai_defense_text_mapping function~
Copy link
Collaborator

@tgasser-nv tgasser-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Mostly naming nits to address.

Could you also run a local integration test (pytest -m integration) with AI_DEFENSE_API_ENDPOINT and AI_DEFENSE_API_KEY set and copy the result into the description?

log = logging.getLogger(__name__)


def ai_defense_text_mapping(result: dict) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a type-annotation here (would dict[str, Any] work with the response?)

nit: Maybe rename to indicate the polarity of the bool returned, i.e. is_ai_defense_text_blocked()? So a True means blocked and False is ok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I've made the suggested changes.

user_prompt: Optional[str] = None, bot_response: Optional[str] = None, **kwargs
):
api_key = os.environ.get("AI_DEFENSE_API_KEY")
if api_key is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe change to if not api_key to catch if the AI_DEFENSE_API_KEY is a falsy value like ""?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

raise ValueError(msg)

api_endpoint = os.environ.get("AI_DEFENSE_API_ENDPOINT")
if api_endpoint is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As above, maybe change to if not api_endpoint to catch an empty string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

raise ValueError(msg)

# Compose a consistent return structure for flows
is_safe = bool(data.get("is_safe", True))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns with a malformed response dict being treated as safe by default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added configuration for fail_open vs fail closed. And also a timeout config.

if isinstance(r, dict)
]
if entries:
log.info("AI Defense matched rules: %s", ", ".join(entries))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be a log.debug() level of info? I could see the logs getting really noisy with a line per API request

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: Add support for Cisco AI Defense API as a guardrail provider for both input (prompt) and output (response) protection in NeMo Guardrails.
3 participants