Skip to content

Conversation

shlomibd
Copy link

@shlomibd shlomibd commented Mar 17, 2025

[X ] Bugfix
Feature
Refactoring (no functional changes, no api changes)
Build related changes
WHOSUSING.md
Other (please describe):
NOTE: Please remember to run ./gradlew spotlessApply to fix any format violations.

Changes in this PR
change the query to always get the limit, and throw exception if return messages is more than the requested count

Describe the new behavior from this PR, and why it's needed
Issue #369
new query is honoring the limit in the query, it is also limiting the popped messages twice so even if the select part is returning more than the requested count, the query will update only the count out of it.
it is also throwing if in any reason it will return more than the requested count - so that the clients won't fail

Shlomi Ben David and others added 3 commits February 12, 2025 09:54
change the query to always get the limit, and throw exception if return messages is more than the requested count
…age_id exist in 2 different queues (it is valid since the PK is queue_name, message_id), the update will pop more than 1 message for the same Id,

so adding the queue_name to the update query
@Robban1980
Copy link
Contributor

Thank you for addressing the PostgreSQL storage issue with this pull request. I have some concerns about the introduction of a new exception:

Necessity of the Exception: If the database query functions correctly with this PR, is the new exception necessary? Could it be removed to avoid potential disruptions?

Handling Raised Exceptions: How will the system handle scenarios where this exception is raised? Could it prevent workflows from being processed? How will operators be informed, and what steps should they take to resolve such issues without modifying the Conductor code?

@shlomibd
Copy link
Author

shlomibd commented Mar 18, 2025

Thank you for addressing the PostgreSQL storage issue with this pull request. I have some concerns about the introduction of a new exception:

Necessity of the Exception: If the database query functions correctly with this PR, is the new exception necessary? Could it be removed to avoid potential disruptions?
It is working correctly with this PR, however, returning more tasks than clients asked can result in client crash or hang, and leaving popped tasks that will not be handled until timeout, the exception will revert the transaction and return the messages to the queue, I tested the exception with the previous query and when it was thrown the next poll was trying to handle the messages ( when it faced the wrong result from the query it did tend to give wrong result for few seconds, but it finally succeeded).

Handling Raised Exceptions: How will the system handle scenarios where this exception is raised? Could it prevent workflows from being processed? How will operators be informed, and what steps should they take to resolve such issues without modifying the Conductor code?
no need to change the code - the exception is already handled - and the next poll will try to get messages again.
workflows continue to be processed.

@jeffbulltech
Copy link

@v1r3n @jmigueprieto Please review

@jeffbulltech jeffbulltech added bug Something isn't working help wanted Extra attention is needed labels Mar 18, 2025
@Robban1980
Copy link
Contributor

My concern regarding the introduction of a new exception in this pull request remains:

Query Accuracy: If the revised SQL query still necessitates exception handling due to returning multiple rows for the same message_id and queue_name, it suggests that the query may not be functioning correctly. The table is designed with a composite primary key on these columns, preventing duplicate messages in the same queue. Therefore, the query should be adjusted to respect this constraint and return only the intended results.​

Potential Workflow Disruption: Introducing an exception in this context could lead to significant issues. If the query continues to return too many results, the exception would be repeatedly triggered, causing messages to remain unprocessed. This scenario represents a major regression, as workflows and events would cease to function without clear indications of the underlying problem.​

Recommendation: It's crucial to ensure that the SQL query is correctly formulated to prevent the retrieval of excess tasks. By doing so, we can maintain data integrity and avoid potential regressions that could adversely affect other users. Additionally, introducing exceptions where they are not necessary can lead to code obfuscation and maintenance challenges. Therefore, it would be prudent to remove the unnecessary exception.

@shlomibd
Copy link
Author

@Robban1980 maybe I wasn't clear before, the new query is accurate and returns the right amount of messages,
it was tested with a scenario that was constantly showing the issue with the previous query, and the problem didn't reproduce with the new query.

regarding the exception, exceptions are intended to protect from unexpected results, in this case - a change in postgres DB that might break this query (even if its unlikely to happen), as I mentioned returning more messages than requested can result in clients hang or crash, hence this additional protection layer was added.
removing the exception part from this PR - will not change the fact that it fix the issue (#369 ), so I can remove it if more reviewer think I should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants