-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transport interface design #6686
Comments
We (@unkcpz @khsrali and me) discussed this issue in person. We basically concluded that this design provides more flexibility in usage but this flexibility comes has no additional use case, so the additional complexity through this design is not worth to follow. To elaborate a but, one could use the transport in this case more generic def foo(transport: Transport):
"""This function can act on sync and async transport plugins."""
with transport.open():
...
# In some part of the code
transport = get_sync_transport()
foo(transport)
...
# In some other part of the code
async_transport = get_async_transport()
foo(async_transport) However the implementation of a function is typically very different when it is implemented concurrently. Therefore it is unlikely that this additional flexibility will be of any benefit. On the other hand overwriting the |
This is true. The other small benefit is when put event loop set in one place it shows we can possibly make sync ssh job run in multi-threading. But this is out the scope of current implementation and it can be done with implement with duplicating the change for every transport operation.
I think in the end the direct "user" of the API is @khsrali for supporting asyncssh and firecrest. If he think this API is hard to follow then there is no strong reason to go. Thanks for discussion @agoscinski @khsrali. |
@khsrali, @agoscinski and I had a discussion on what should be the proper interface for the transport plugin to make it support both the sync and async implementations.
The problem with the design of #6626 is, it defines three interfaces.
It has
_BaseTransport
which is the internal interface that provide some shared methods and the base class is supposed to be only inherited byBlockingTransport
andAsyncTransport
.In the
BlockingTransport
andAsyncTransport
both of them has two styles of methods to support both async call and sync call.Take
mkdir
as example, there are alsomkdir_async
. Both of them exist in theBlockingTransport
andAsyncTransport
.In
BlockingTransport
themkdir_async
methods does not need (and forbid to be override) to be implemented but will constructed automatically from sync version ofmkdir
.In the
AsyncTransport
themkdir_async
methods are supposed to be implemented, and the syncmkdir
will be constructed automatically from async implementation.A good design should take three groups of people into consideration:
aiida-core
developers. who don't want to deal with three types of interface to dispatch the methods used inside the aiida-core code base.I can understand reasons behind having three interfaces for the same type of operations:
aiida-quantumespresso
) that already calls method from transport directly, it has no information of transport it use. It might be async or sync. If it is async and it calls withoutawait
the function is not run.aiida-core
, there are places where transport is used in the sync manner withoutawait
, and it require the method call not await. Change all of those is out of the scope of the PR and not all of them are necessary.Having two different names for method make it possible to distinguish which type of method should be called.
But if the function is an async function, from function signature the
async def foobar()
is already tell it is an async function, havingasync def foobar_async()
is redundunt.When talking about aiida plugin, it is actually the dependency injection pattern we used in
aiida-core
that in the core code base we assume the plugin has certain methods implemented.In
aiida-core
those interface methods are called (this also know as duck typing it is the pattern recommend in python), and leave the actually implemetation to the plugin.The interface forming the contract of class for the plugin developers.
Go back to the purposes of having the interface for the plugin type, there are two folds of purposes:
aiida-core
or by other plugin) what methods can be called.For the first purpose, as a transport plugin developer, what they want is no matter the plugin is async or sync, only need to look at the same contract and then knows what methods need to provide.
For the second purpose, inside
aiida-core
daemon, when it requires call transport to interact with remote resource, all the functions are in async manner which is the new feature introduced by aiidateam/plumpy#272 with the changes are applied inaiida-core
in Ali's PR.However, problem raises when the transport used outside
aiida-core
.When calling the method, if the function name is the same for both async and sync, calling async version of
transport.mkdir()
without putting it in the event loop won't run the function.It lead to the question: "do we suppose the user outside aiida-core to call async function directly?" My answer is: NO.
I remember Martin told me the original design of having all complex conversion between async/sync back and forth in plumpy with introduce the synchrouns
RmqThreadCommunicator
which wrap the asyncRmqCommunicator
is we don't want to make plugin developer to deal with any async programming.Now, the transport become an exception that having async implementation can dramatically improve the performance, then we take the move.
But except it, it is better to keep the other plugins still only require regular synchronous programming.
Thus, I think it is possible to assume when calling the transport from outside it is all in synchronous manner.
Here comes to my proposal.
transport.mkdir
) that forming the contract for both async and sync implementation.async def mkdir():
and paramiko ssh plugin implementdef mkdir():
.aiida-core
, insideexecmanager
module it required to call the coroutine. So before calling thetranport.mkdir()
, it require anis_coroutine
check and convert it to coroutine as did in plumpy. The transport can first be convert to an async transport then all method can be called with await.aiida-core
, things are opposite since the async function will be called in the synchronous manner. But it is not possible to ask user who calls the function to do theis_coroutine
check and create event loop to run async method. The solution is to having a wrapper that will make a sync transport from any kind of transport and expose the interface to be called from outside world.Some code snippets of how I think it will work.
For the interface, I provide the protocol so both async and sync class can work with it.
The synchronous and async transport plugins will be
both transport object conform with
Transport
protocol and can be checked byisinstance(trans_obj, Transport)
.In
execmanager
module, takeupload_calculation
function as example, it can be changed to an async function after the plumpy PR I mentioned above.Inside, it has a
await transport.copy()
, to make it can be called for sync transport, I introduce following async transport proxy.In the opposite, when calling the transport from outside of
aiida-core
, it was always assuming the method is called in the synchronous way.Take an exmaple in
aiida-quantumespresso
where thepwimmigrant.py
(in fact the only place where this happened) module will calltransport.get()
.To make it works, we need the transport passed to the call is a synchronous one.
The only API provide in
aiida-core
to get the transport isauthinfo.py::AuthInfo.get_transport
, we can then addget_sync_transport()
that can convert a async transport to sync one.If we want to keep the compatibility for outside the
aiida-core
, we can haveget_transport
always return the sync transport, and having aget_async_trasport
that will to the AsyncProxy converting.For the
SyncProxy
,To summary, if what I proposed above can work, it can cutdown the lines of changes needed to support both type of transport.
It then provide the only one source of contract to be implemented.
It is clear the transport that used is async or sync by wrapping it through Proxy class and used in in the correct manner.
It has a single place to control which event loop to run the async function when it needs to be run in synchrouns manner.
As a bonus, when I writing the pseudo code for AsyncProxy above, I realize the sync function might be okay to run with
loop.run_in_executor
which uses the threadpool by default and it may solve the thread blocking problem when it comes to transport interaction by letting the operating system to manage scheduling threads.Pining @danielhollas @giovannipizzi who were involved in the discussion during coding week. Do you see any problem with this design?
The text was updated successfully, but these errors were encountered: