You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've hit scaling issues with logging via Aurora Serverless on several occasions https://github.com/NASA-IMPACT/hls_development/issues/232 and #301. Though some of this could be alleviated with improved database architecture and maintenance it might be worth considering solutions that don't require any database to reduce a central point of failure when performing massive scale processing (as will be likely during a reprocessing campaign).
The architecture we are using currently was designed more than 5 years ago so it is definitely worth revisiting and refactoring based on lessons we've learned and new ideas.
In reality, a lot of the operations we currently do for processing state tracking through a combination of step functions and Aurora Serverless could likely be accomplished with a combination of step functions and writing intermediate files to S3 (and having other processes check for the presence of those files).
With assistance from @ceholden and @chuckwondo I'd like to draw some new architecture proposals which incorporate this concept and review them for the following questions
Will we hit AWS S3 quota limits with this type of architecture?
What will the predicted costs for the potentially heavy S3 PUT and GET requests this architecture might generate?
Should we build this as a completely new orchestration pipeline or just refactor our existing pipeline?
The text was updated successfully, but these errors were encountered:
We've hit scaling issues with logging via Aurora Serverless on several occasions https://github.com/NASA-IMPACT/hls_development/issues/232 and #301. Though some of this could be alleviated with improved database architecture and maintenance it might be worth considering solutions that don't require any database to reduce a central point of failure when performing massive scale processing (as will be likely during a reprocessing campaign).
The architecture we are using currently was designed more than 5 years ago so it is definitely worth revisiting and refactoring based on lessons we've learned and new ideas.
In reality, a lot of the operations we currently do for processing state tracking through a combination of step functions and Aurora Serverless could likely be accomplished with a combination of step functions and writing intermediate files to S3 (and having other processes check for the presence of those files).
With assistance from @ceholden and @chuckwondo I'd like to draw some new architecture proposals which incorporate this concept and review them for the following questions
PUT
andGET
requests this architecture might generate?The text was updated successfully, but these errors were encountered: