Skip to content

Conversation

pierrehanne
Copy link

Title:

Add support for replacing sensitive content in DynamoDbChatStorage [Issue #131] @brnaba-aws

Description:

Overview:

This PR introduces the ability to replace sensitive content in chat messages when using the DynamoDbChatStorage class. This is useful for ensuring that sensitive information (e.g., secret data, personal information) is masked before it is stored or retrieved from DynamoDB.

Changes:

  • Sensitive content masking: Added logic to mask sensitive content in messages before saving them to DynamoDB.
  • Sensitive content unmasking: Added functionality to reverse the masking when fetching messages.
  • New parameter: Introduced a sensitive_mappings parameter to the DynamoDbChatStorage class, which contains a dictionary of words/phrases to be masked and their replacements.
  • Helper method: Created the _anonymized_content method to handle the masking and unmasking of sensitive content in both directions (save and fetch).

How It Works:

  1. Masking sensitive content before saving: When saving a message using save_chat_messages(), the content is processed through the _anonymized_content method, where sensitive words (defined in sensitive_mappings) are replaced with asterisks (e.g., "secret" becomes "******").
  2. Unmasking sensitive content after fetching: When fetching messages using fetch_chat(), the same _anonymized_content method is used with the reverse=True flag to unmask previously masked content for retrieval.

Test Changes:

  • Added unit tests to verify that sensitive content is correctly masked before saving and unmasked when fetched.
  • Updated test assertions to ensure the masking/unmasking logic works as expected.

Why This Is Useful:

  • Security: Prevents storing or exposing sensitive data in plain text, making it more secure.
  • Compliance: Helps in ensuring that sensitive information is handled properly, in line with security and privacy standards.

Testing:

  • The unit tests now include scenarios where messages contain sensitive words like "secret" and "classified". These words are masked (e.g., secret******) when saved and unmasked when retrieved.

How to Test:

  1. Check the DynamoDbChatStorage class for the sensitive_mappings parameter.
  2. Test saving messages containing sensitive data and ensure they are masked.
  3. Test fetching messages and ensure the sensitive data is unmasked correctly.

@cornelcroi
Copy link
Contributor

Hi @pierrehanne
Thank you for the contribution.
Can you also update the documentation (the DynamoDB Storage section) to explain this feature (and add some code examples) ?

@pierrehanne
Copy link
Author

Hi @cornelcroi
I update the documentation for dynamoDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants