Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider alternative job tracking / logging architecture #303

Open
sharkinsspatial opened this issue Dec 24, 2024 · 0 comments
Open

Consider alternative job tracking / logging architecture #303

sharkinsspatial opened this issue Dec 24, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@sharkinsspatial
Copy link
Collaborator

We've hit scaling issues with logging via Aurora Serverless on several occasions https://github.com/NASA-IMPACT/hls_development/issues/232 and #301. Though some of this could be alleviated with improved database architecture and maintenance it might be worth considering solutions that don't require any database to reduce a central point of failure when performing massive scale processing (as will be likely during a reprocessing campaign).

The architecture we are using currently was designed more than 5 years ago so it is definitely worth revisiting and refactoring based on lessons we've learned and new ideas.

In reality, a lot of the operations we currently do for processing state tracking through a combination of step functions and Aurora Serverless could likely be accomplished with a combination of step functions and writing intermediate files to S3 (and having other processes check for the presence of those files).

With assistance from @ceholden and @chuckwondo I'd like to draw some new architecture proposals which incorporate this concept and review them for the following questions

  1. Will we hit AWS S3 quota limits with this type of architecture?
  2. What will the predicted costs for the potentially heavy S3 PUT and GET requests this architecture might generate?
  3. Should we build this as a completely new orchestration pipeline or just refactor our existing pipeline?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants