Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up transaction management for file_complete handler #930

Merged
merged 4 commits into from
Jan 21, 2025

Conversation

BenGalewsky
Copy link
Contributor

@BenGalewsky BenGalewsky commented Nov 25, 2024

Problem

The TransformerFileComplete resource handler is the most critical code in the entire stack. It responds to each file in the dataset being transformed, and is responsible for updating the total number of files processed (either successfully, or failure) - these two counters are how we determine if the transform is complete. The endpoint will be hit repeatedly by all of the running transformers. Consequently, database transaction handing is very important to avoid missing files.

The current implementation uses implicit transactions and doesn't manage locks and flushing to the DB. It's possible that this allows for files to be lost during big transform requests.

Approach

  1. Us the DB session to explicitly manage transactions
  2. Update the record_file_complete to read the request with with_for_update flag set which will lock the record in the db
  3. The increments to files are handled in the same transaction
  4. To make the file more readable, the retry call arguments are captured in a single, new decorator.file_complete_ops_retry -

With this decorator, I ran into a problem with unit tests. Importing the module caused the current_app.logger expression to be evaluated. This would throw RuntimeError: Working outside of application context. in the unit tests. Worked around this in the decorator to only access that logger if we are inside the flask app

@BenGalewsky BenGalewsky marked this pull request as draft November 25, 2024 21:22
Base automatically changed from delete_fixes to develop November 26, 2024 04:33
@BenGalewsky BenGalewsky force-pushed the file_complete_transaction branch from 70a6702 to c2fb610 Compare December 4, 2024 13:50
@BenGalewsky BenGalewsky requested a review from ponyisi December 4, 2024 19:09
@BenGalewsky BenGalewsky marked this pull request as ready for review December 4, 2024 19:09
@BenGalewsky BenGalewsky force-pushed the file_complete_transaction branch from c2fb610 to 4a4ca9a Compare December 10, 2024 22:18
@BenGalewsky BenGalewsky requested a review from ponyisi December 16, 2024 16:38
The celery transaction model settings to guarantee we never lose a file
could result in duplicate file reports. Handle this by adding a
unique key to the transformation_result table. Attempt to insert the
record and just report a warning if that fails, but don't increment the
file counter.
@BenGalewsky BenGalewsky force-pushed the file_complete_transaction branch from 8ca7037 to 9ad4732 Compare January 15, 2025 17:58
@@ -456,7 +463,7 @@ def init(args: Union[Namespace, SimpleNamespace], app: Celery) -> None:
"--without-mingle",
"--without-gossip",
"--without-heartbeat",
"--loglevel=warning",
"--loglevel=info",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this bring back a lot of uninteresting verbosity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The better to debug with til we are certain this is fixed -- I suppose it could be a helm chart setting, but that's a bit of a chore to work through

@BenGalewsky BenGalewsky merged commit 6e48c6f into develop Jan 21, 2025
69 checks passed
@BenGalewsky BenGalewsky deleted the file_complete_transaction branch January 21, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants