Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncioConnection: fix initialize_reactor when called in event loop #327

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

Lorak-mmk
Copy link

Previously, if executed within existing asyncio loop, driver would take the loop, assume it's not used and start it in a separate thread.

Additionally, if executed outside of loop, driver would create a new one and make it default for calling thread.

Those behaviors are wrong so they are changed. Now driver creates its own loop and executes it in a thread. Code that handled pid changes, which can happen when class is transferred using e.g. multiprocessing, is fixed too - previously it didn't create new thread after such transition.

Previously, if executed within existing asyncio loop, driver would take
the loop, assume it's not used and start it in a separate thread.

Additionally, if executed outside of loop, driver would create a new one
and make it default for calling thread.

Those behaviors are wrong so they are changed. Now driver creates its
own loop and executes it in a thread. Code that handled pid changes,
which can happen when class is transferred using e.g. multiprocessing,
is fixed too - previously it didn't create new thread after such transition.
@fruch
Copy link

fruch commented Jun 4, 2024

@Lorak-mmk

How did you test this ?

We desperately need a test that works with asyncio, in a similar fashion as test.py

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

@Lorak-mmk
Copy link
Author

@Lorak-mmk

How did you test this ?

Manually:

  • with a reproducer that directly calls initialize_reactor that was posted on the issue
  • by creating a session in async def that was passed to asyncio.run
  • by veryfying the behavior of class parameters and asyncio when passed to a different process via multiprocessing

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs.
And regarding current bug, I predicted there will be problems with running it inside existing loop: #260 (comment) #271 (review)
Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

@fruch
Copy link

fruch commented Jun 4, 2024

@Lorak-mmk
How did you test this ?

Manually:

  • with a reproducer that directly calls initialize_reactor that was posted on the issue
  • by creating a session in async def that was passed to asyncio.run
  • by veryfying the behavior of class parameters and asyncio when passed to a different process via multiprocessing

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

so lets try

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs.

that's exactly the problem, it's not battle proven enough yet (not by us, and not by datastax)
we have similar situation with the cqlengine, that we didn't touched almost at all, it would have been easier for us
if we declared it experimental.

I still think we should revert that call, and align with upstream, it too risky and it would backfire on our core unit tests first, and on user and customer right after, the integration suite doesn't really cover it enough.
for example one experiment we can do, it run all of dtest with this backend, for sure we gonna find more issues.

And regarding current bug, I predicted there will be problems with running it inside existing loop: #260 (comment) #271 (review) Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

@Lorak-mmk
Copy link
Author

@Lorak-mmk
How did you test this ?

Manually:

  • with a reproducer that directly calls initialize_reactor that was posted on the issue
  • by creating a session in async def that was passed to asyncio.run
  • by veryfying the behavior of class parameters and asyncio when passed to a different process via multiprocessing

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

so lets try

Ok, I'll try to add some tests tomorrow.

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs.

that's exactly the problem, it's not battle proven enough yet (not by us, and not by datastax) we have similar situation with the cqlengine, that we didn't touched almost at all, it would have been easier for us if we declared it experimental.

I still think we should revert that call, and align with upstream, it too risky and it would backfire on our core unit tests first, and on user and customer right after, the integration suite doesn't really cover it enough. for example one experiment we can do, it run all of dtest with this backend, for sure we gonna find more issues.

So let's make it more battle tested and fix issues that we find. How to run dtests with asyncio backend?

And regarding current bug, I predicted there will be problems with running it inside existing loop: #260 (comment) #271 (review) Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

@fruch
Copy link

fruch commented Jun 4, 2024

@Lorak-mmk
How did you test this ?

Manually:

  • with a reproducer that directly calls initialize_reactor that was posted on the issue
  • by creating a session in async def that was passed to asyncio.run
  • by veryfying the behavior of class parameters and asyncio when passed to a different process via multiprocessing

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

so lets try

Ok, I'll try to add some tests tomorrow.

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs.

that's exactly the problem, it's not battle proven enough yet (not by us, and not by datastax) we have similar situation with the cqlengine, that we didn't touched almost at all, it would have been easier for us if we declared it experimental.
I still think we should revert that call, and align with upstream, it too risky and it would backfire on our core unit tests first, and on user and customer right after, the integration suite doesn't really cover it enough. for example one experiment we can do, it run all of dtest with this backend, for sure we gonna find more issues.

So let's make it more battle tested and fix issues that we find. How to run dtests with asyncio backend?

patching this place, would do the trick:
https://github.com/scylladb/scylla-dtest/blob/44605a072659649a5d420442b635a2fd733b1731/dtest_setup.py#L475

And regarding current bug, I predicted there will be problems with running it inside existing loop: #260 (comment) #271 (review) Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

@fruch
Copy link

fruch commented Jun 4, 2024

@Lorak-mmk

running gating tests with it (it's a big portion of all of the dtest)
https://github.com/scylladb/scylla-dtest/pull/4400

@fruch
Copy link

fruch commented Jun 6, 2024

@Lorak-mmk

we can install this PR into dtest like that:

pip install git+https://github.com/Lorak-mmk/python-driver.git@fix-asyncioconnection

and dtest pipeline can use that trick, with the following parameter

DRIVER_VERSION='git+https://github.com/Lorak-mmk/python-driver.git@fix-asyncioconnection',

I've send out a gating run with this PR:

it probably not gonna teach us anything new, since dtest isn't using asyncio on it's end.

@roydahan @yaronkaikov it would be nice if we can apply similar trick to test.py and byo pipelines
i.e. passing the driver we want to use in the tests, and then we can harness dtest / test.py into the CI drivers as needed

@roydahan
Copy link
Collaborator

roydahan commented Jun 6, 2024

Sounds good to me.
What's needed to be done in test.py in order to support it?

@fruch
Copy link

fruch commented Jun 6, 2024

Sounds good to me. What's needed to be done in test.py in order to support it?

something along the lines of:

DRIVER_INSTALLATION_CMD=pip install git+https://github.com/Lorak-mmk/python-driver.git@fix-asyncioconnection
./dbuild bash -c "${DRIVER_INSTALLATION_CMD}; test.py ..."

and away to pass from the pipeline arguments the actual version up into the command it self
(in dtest there are a few layer until it get there

I did it manually once with test.py, but it was from local repo, so I need to add a mount point.
but it is how people can test fixed manually nowadays (no that anyone know exactly how)

@fruch
Copy link

fruch commented Jun 6, 2024

@Lorak-mmk result from dtest, show the exactly same SSL related issue we seen before this fix.

so as expected, it didn't told us anything new about if this fix is good enough.

I think one might want to test it with test.py, if you didn't come up with build a specific test for it.

@mykaul
Copy link

mykaul commented Jun 10, 2024

Who owns the action item here? We need to see it solved!

@Lorak-mmk
Copy link
Author

Who owns the action item here? We need to see it solved!

It's not that high priority now - issue in Scylla was solved by publishing ARM wheels, and I think we are going to mark asyncio backend as experimental anyway until #330 is solved.
I'm writing regression test now, we should be able to merge this PR when I'm finished.

@Lorak-mmk Lorak-mmk self-assigned this Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants