-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of previously found problems (will be updated) #438
Comments
@dgafka Hi! Perhaps in your practice you have experienced the problems described here, which we sometimes see when testing our solution. |
@dgafka Point 6 is particularly interesting... UPD2: There are pending queries to the database: DELETE FROM ecotone_deduplication WHERE handled_at <= $1 |
@dgafka I'll monitor which transactions might remain open for too long and in which cases this happens. But if you can, please check whether rollbacks are being done correctly everywhere and whether the deletion from ecotone_deduplication can be done periodically with a low frequency and definitely not within the main transaction. SELECT pid, usename, client_addr, application_name, state,
age(clock_timestamp(), xact_start) AS transaction_age,
query, query_start
FROM pg_stat_activity
WHERE state <> 'idle'
AND xact_start IS NOT NULL
AND age(clock_timestamp(), xact_start) > interval '1 minute'
ORDER BY transaction_age DESC; |
If you have transactions enabled for Command Bus, then projection update is atomic to events being appended to stream (projection trigger is wrapped within same transaction). |
@dgafka Thank you for your attention.
3, 5. I am especially worried about this moment since there is a requirement to hold the user's balance during authorizations from his cards, and here we need to immediately update both the balance projection and his account statement in the balance before, balance after format, and making separate streams for each user is not an option. The main problem is that we need to respond quickly, otherwise the provider will decline the card authorization.
|
@dgafka Hi SELECT pid, usename, client_addr, application_name, state,
age(clock_timestamp(), xact_start) AS transaction_age,
query, query_start
FROM pg_stat_activity
WHERE state <> 'idle'
AND xact_start IS NOT NULL
AND age(clock_timestamp(), xact_start) > interval '1 minute'
ORDER BY transaction_age DESC;
The issue automatically repeats after a long idle period, especially in the RabbitMQ listener. We will now try to limit the consumer's lifetime using the executionTimeLimit argument to one hour. Additionally, we have increased the Kubernetes graceful shutdown period from 30 seconds to 60 seconds — somewhere, a transaction is not being closed properly or a rollback is not being performed. |
Is it possible to implement periodic closing and reopening of the Doctrine connection for consumers? |
@dgafka We found several places where the transaction was not closed correctly in the roadrunner worker. I'll write more detailed results and what we came up with based on optimizations. |
Remember that you can build pretty easily any extension for the Message Consumers yourself. Specific cases does not need to be part of the framework for you to be able to build them :) So extending asynchronous endpoints is pretty much straightforward, you can read more in this section. |
I'm wondering if it's possible to affect DbalTransactionInterceptor ( |
Well that would need to be done through framework, as those are framework related classes. But it will be pretty straight forward to actually roll customized version of it and disable the framework one. |
@dgafka Why might this situation occur, and wouldn't we expect a synchronous retry by Ecotone\Messaging\Support\ConcurrencyException in the aggregate version conflict? Have you encountered this issue before?
|
@lifinsky as far as I remember that may happen if after sql exception before doing rollback another SQL is triggered |
I understand the technical reason, but there is only an event store and saving to the projection. Perhaps there is a problem with the foreign key or a violation of uniqueness in the projection, but then there should be a rollback? Instant retry is only configured for optimistic exception, so there is no second attempt. On Monday I will look into this in more detail... |
@dgafka Somehow I can't find information about 8 seconds. Probably you still have this default configuration above.
|
@lifinsky yep, you're right, thought the default was different. So to provide greater waiting time, would have to be customized with config. |
@dgafka Is it possible to allow modifying the gap detection retry setting at the service configuration level for each projection independently, or at least for all projections, including those running synchronously? I see that the documentation mentions it for pooling projections, but based on the code, it seems that this option is no longer available. |
Ecotone version(s) affected: latest
Description
If there is an error in the message payload converter for a distributed endpoint, then the message does not end up in the dead letter. Let's try to write a test for this.
When several event sourcing aggregates and projections for them are processed in an asynchronous endpoint, then in the case of an incorrect (already renamed) event class name in one of the streams with a delayed retry, we get many records in the projections without the changes themselves in the stream - it feels like a complete rollback of the transaction is not happening. We plan to also write a test to reproduce this.
Our projections all work synchronously with aggregates, while they receive a strange status in the
projections
table, eitheridle
orrunning
for the same eventsHow to solve a situation when some exceptions should go to delayed retry, while others should be processed through a custom
ServiceActivator
handler? For now we are making one commonServiceActivator
for theerror
channel and looking at the exception class there - but perhaps it’s worth providing a better option?For synchronous projection, what will happen in the case of gap detection (is such a scenario possible when the projection lags behind the event sourcing aggregate stream)? Can't there be a situation where the aggregate stream is updated after a sync retry (for example, due to OptimisticLockException), but the projection remains unupdated and waits for the next events in the stream?
There was a case where an AMQP consumer got stuck on an event-sourcing aggregate command, and in the queue, it was visible that the message was received but not acknowledged. The last log message was: Executing Command Handler...
It’s possible that this behavior is somehow related to defaultMemoryLimit: 256. It's hard to determine the exact cause since this happened only once so far. The first pod restart didn’t help, but after I completely deleted the deployment and started a new pod, all messages were processed successfully.
The text was updated successfully, but these errors were encountered: