Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New serverless pattern - Serverless Messaging Redrive #2543

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zalilias
Copy link

Description of changes:
A serverless AWS solution for processing and fixing malformed messages using SQS queues and Lambda functions. This system automatically handles message validation, applies fixes where possible, and routes messages to appropriate queues based on their fixability. With built-in error handling and detailed logging, it provides a robust framework for message processing that can be easily extended for specific business needs.

The architecture uses AWS Lambda for processing, multiple SQS queues for message routing, and includes a dead-letter queue (DLQ) for messages requiring human intervention. Built with AWS SAM.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

@benjymoses benjymoses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include a README.md with a detailed description of what this sample is, as well as deployment, testing, and deletion instructions.

You also need to include the *pattern.json file as per the sample adapted to your needs.

If any of the feedback / comments aren't clear, feel free to DM me offline to discuss and I can help point you in the right direction.

import logging

# Set up logging
logger = logging.getLogger()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an opportunity to use the PowerTools logging module here instead of Python native?


def can_fix_message(message):
"""
Determine if a message can be fixed automatically.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a sample scenario for customers to reference as a model? That could be "bad character replacement", "downstream system temporarily unavailable", or anything else you choose.

MAIN_QUEUE_URL = os.environ['MAIN_QUEUE_URL']
FATAL_DLQ_URL = os.environ['FATAL_DLQ_URL']

def can_fix_message(message):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using a function to determine if a message can be fixed or not, then as part of its return you need to know which "fix strategy" to use, and this needs to be a parameter for your fix_message function too doesn't it so that it knows which fix to apply.


def fix_message(message):
"""
Apply fixes to the message.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as on the other function, the wording needs to be expanded to help people with zero context understand what's meant to be happening here.

logger.error(f"Validation error: {str(e)}")
return False

def fix_message(message):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function definition needs to be expanded to include a fix strategy that you fork to within the function depending on what type of "fix" is needed, whether that's character replacement, a delay, etc.

@@ -0,0 +1,2 @@
boto3==1.26.137
jsonschema==4.17.3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used directly anywhere?

@@ -0,0 +1,121 @@
import json
import logging

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use PowerTools logging implementation?

@@ -0,0 +1,2 @@
boto3==1.26.137
jsonschema==4.17.3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used directly?

Runtime: python3.9
CodeUri: ./functions/processor/
MemorySize: 128
Timeout: 29

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always going to be adequate as this scales out? Consider parameterising this value (for both functions) and including more comments about the when it's declared at the top of this YAML doc describing when people might want to increase it.

Runtime: python3.9
CodeUri: ./functions/decision_maker/
MemorySize: 128
Timeout: 29

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about parameterisation as above

@benjymoses
Copy link

@ellisms I've been helping @zalilias with this and have done a first cursory review. It's worth letting him parse through that and integrate / reject my feedback before you do your review so you're making good use of your time.

Copy link

@benjymoses benjymoses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @zalilias thanks for bearing with me doing this next pass for you. This has come a long way but before we send to @ellisms for review I think there's a few more bits to go through.

As usual, if any of my comments don't make sense or you're not sure, just ping me 👍

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zalilias you need to rename the file to be in line with the name of your pattern and replace the contents of the template / example with information about your pattern.

- Extensible processing logic

## API Reference
# Send Message

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to include a copy of the API contract, but It would be nice to see commands that users can use in their terminal as an example curl https://${endpoint}/message.... with the relevant params to send a sample message.

4. Failed messages route to DLQs
5. Decision maker attempts an automated recovery

## Deployment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely to come up in @ellisms review, but we typically include a statement on users being responsible for spinning up / tearing down / paying for resources in their own AWS accounts. We'd also typically talk about having the AWS CLI installed and configured as a pre-requisite to using SAM. Take a look at some other SAM examples to get a feel for the norm.

## Basic Flow
1. Messages enter through API Gateway
2. Main queue receives messages
3. Processor Lambda handles messages

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would include that this Lambda function polls the main queue using Event Source Mappings (ESMs) and handles the messages. Include links to docs / blogs / posts where they can build understanding.

1. Messages enter through API Gateway
2. Main queue receives messages
3. Processor Lambda handles messages
4. Failed messages route to DLQs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to use terse statements in a numbered or bulleted list, it would be good to include a more full explanation alongside them. In this context what's a "failed message"? See if you can go a level deeper to explain what's going on as it may be confusing to somebody unfamiliar with the pattern or certain service features, like DLQ.

@@ -2,6 +2,68 @@ AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Serverless message processing architecture

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description doesn't match what you're actually doing - this isn's just about message processing, it's about attempted automated failure re-drive.

- Performing CPU-intensive operations
- Seeing timeout errors or slow processing
- Handling concurrent message processing
Recommended ranges:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "heavy" mean to the user? Isn't this ultimately dependent on how computationally expensive the operation would be? Perhaps in the context of a demo pattern think instead of suggesting 2 values with the lower as the default given it's just a sample and not meant to be production grade.

There would be an expectation that if a customer wanted to fork this and use it themselves that they'd do their own due diligence on memory sizing for their use case. Likely using Lambda Power Tuning.

MinValue: 128
MaxValue: 10240

DecisionMakerMemorySize:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for ProcessorMemorySize

MinValue: 1
MaxValue: 900

DecisionMakerTimeout:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter has all of the same constraints as the one above for the same reasons (they're Lambda functions). I'd collapse in to one parameter called LambdaTimeout and make use of it for both functions. Make sure to update the first line of the description.

@@ -115,8 +178,9 @@ Resources:
Handler: app.lambda_handler
Runtime: python3.9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.9 is 4 years old now, should we not be defaulting to the current LTS as this is a newly published asset? 3.12 seems like a reasonable default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants