Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a class or object to represent the global config #44

Open
AlanCoding opened this issue Jan 16, 2025 · 7 comments
Open

Add a class or object to represent the global config #44

AlanCoding opened this issue Jan 16, 2025 · 7 comments
Labels
refactoring code build-out and clean-up

Comments

@AlanCoding
Copy link
Member

There are 2 sides to the dispatcher, the publisher and the runner itself. This has started to become a pain-point in tests and demos where the connection string for psycopg has to be re-constructed on the publisher side, for example.

We should instead have a single config / settings object that lazily gets stuff from the sources so the publisher and runner are reading from same places, and database connection details are configured once.

@AlanCoding AlanCoding changed the title Add a global object to represent the global config Add a class or object to represent the global config Jan 16, 2025
@AlanCoding AlanCoding added the refactoring code build-out and clean-up label Jan 16, 2025
@AlanCoding
Copy link
Member Author

I started on some other changes to the config structure, separate from this in:

main...class_config

And I've realized the important question is "what would I like the config to look like?". So, borrowing from that, and expanding:

# Demo config
---
service:
  AutoScaleForkServer:
    max_workers: 4
brokers:
  PGNotifyBroker:
    config:
      conninfo: dbname=dispatch_db user=dispatch password=dispatching host=localhost port=55777
    channels:
    - test_channel
    - test_channel2
    - test_channel3
  SocketBroker:
    sockfile: /var/run/dispatcher/dispatcher.sock
producers:
  # Brokered producers are automatically created
  ScheduledProducer:
    task_schedule:
      'lambda: __import__("time").sleep(1)':
        schedule: 3
      'lambda: __import__("time").sleep(2)':
        schedule: 3

After struggling with this, I've decided that because brokers apply to both the publisher and the service, it should be its own section.

The publisher doesn't need any configuration beyond what is in the broker. Other details, such as creating a new connection, are all per-submission. As in, the default choice is obvious (don't create a new connection), but it must be configurable on submission to create a new connection. But this depends on the particular context you are submitting from.

I would like to eventually create some type of dispatcherctl entrypoint. This was handled in AWX by awx-manage run_dispatcher --status which isn't really as front-and-center as one would like. Also, we would like to use more reliable local communication. This might take some configuration, but so-far, I'm leaning against ctl-specific configuration. It should be self-obvious that SocketBroker is preferable to PGNotifyBroker for dispatcherctl commands, and this can be established in the code for those classes directly with a priority value or something.

The least-obvious part is how channels are handled. One would argue that channels should only apply inside of producers as opposed to brokers, because it is only relevant for the service, and not the publisher. The task itself needs to specify queue=, or otherwise the submission needs to do so... but I think there's a valid case that we should validate that the channel is active before submission, so that we don't send a message that obviously can not ever be received. However, this is impossible for AWX because you can't know the CLUSTER_HOST_ID for all nodes, so this idea goes to the garbage bin. Nonetheless, I think this justifies the location of channels in the config as an academic argument for a future configurable enhancement.

Finally, and the main goal of what I want to do here, how to coherently get the connection from Django settings? How would this be configured? My answer is that it would be configured for both the service & the publisher at the same time. Whether the connection is async or not is totally separate from whether we take conn params, or we get them from Django. The logic to get them from Django is relatively well finalized in ansible/django-ansible-base#686. This logic would be copied here, inside of PGNotifyBroker. So instead of specifying config under that, you can defer to Django. In fact, this would be the "default" settings. This would actually allow a process to publish a message, ignorant of the dispatcher config. This is actually kind of how AWX works, I just needed to put it into a config framework. So that default config might look like:

---
brokers:
  PGNotifyBroker:
    config: __from_Django__

This would be invalid to configure the service (as it should be), but would still allow publishing a message. Logically, you would turn off the check that you have a valid channel (as AWX is today).

But what's weird is that AWX intentionally wants to set application_name for the service connection but not the publisher connection. I don't have a great answer for that, other than having a durable python integration so that the config is passed in.

@AlanCoding
Copy link
Member Author

Why? It seems I haven't linked this, so here:

'task': f'dispatcher.brokers.{broker.broker}.publish_message',

This is a bit ridiculous what's happening here. If we're not using Django, then we have to do some connection management ourselves. However, this isn't a priority, since we'll probably always be using Django.

The design intent is that both the main process and worker will get the connection from the broker class, which comes from the dispatcher config. In this line, the worker subprocess becomes the publisher for control-and-reply. The change being asked for here is to no longer pass the database config to the worker.

We should be aware, the hazard here is that running awx-manage run_dispatcher, or something like this in another project, we programatically pass the config to the main dispatcher method, and by intent we use a different config for the service vs the publisher. The resolution, thus, is that the worker uses the same config as the publisher, which is to reuse the Django connection, and other variations of the config is likely to not work.

Our solution to this hazard, then, is the we have a global config object, and call some kind of a config.setup() method pre-fork.

What about some future process model where this is no longer an option? Then it may be possible that the process manager passes the config data to the root worker process as JSON type data. This would still avoid passing the data for every method... or we could not solve this problem and allow the worker callback post-fork to load the config data. I'm probably over-thinking by this point.

@AlanCoding
Copy link
Member Author

The obvious model to follow for settings would be the Django lazy settings stuff

https://github.com/django/django/blob/89e28e13ecbf9fbcf235e16d453c08bbf2271244/django/conf/__init__.py#L39

that's overkill for most of what we need. Some simpler object that delays loading until an attribute is gotten should work fine.

@AlanCoding
Copy link
Member Author

On the topic of "sections" in the config, #8 would add another type. Presumably, this would take the form of:

callbacks:
  worker_start: test_app.tasks.clean_on_worker_start
  pre_fork: test_app.tasks.set_up_django

This, again, is probably most useful for Django or demos, but I expect people would find it useful and may prefer it at some point.

Then to add another thing, I'd like to add an alternative to the @task() decorator. So if you didn't want this in your code, then you should be able to register methods in your config. I don't see why not? I don't have a strong reason to prioritize it, but again, I want to just start out with a complete philosophical package.

tasks:
  test_app.tasks.hello_world:
    queue: default
  test_app.tasks.delete_inventory:
    queue: default

This would then register those tasks in the registry, and probably still needs to paint them with apply_async and delay.

Next, because we're changing things, should we start putting a version on? Likely.

version: 2

...implying the current version is 1.

Now, collecting even more, what should we do about stuff like this? ansible/awx#15780, where we have a simplistic dispatch "setting" that says "do this" or "don't do this" based on the user's needs. Should these be top-level? Or should they be nested? I'm going to go with "nested", because they need a naming convention that is meaningful and might correspond to Django settings at some point.

settings:
  mock_publish: true
  validate_queue: false
  validate_task: true
  method_delimiter: "_"

I envision a lot of validate_* settings. Since you could require that a task is registered in order to run that task. Or require something else related to settings consistency. It works without this, but it's probably best-practice to have a rigid settings for what is allowed to be ran.

@Alex-Izquierdo
Copy link
Collaborator

I would agree with most of your thoughts. The only thing that seems odd to me is versioning the config. I wonder if the library should just have a class for settings and other logic be implemented in a django app.

@AlanCoding
Copy link
Member Author

@chrismeyersfsu made a point that we should be able to import django.db.connection or something as an improvement over the string __from_Django__.

I've been doing some work on this topic in ansible/django-ansible-base#686

But I would note that pointing to django.db.connection is far from trivial to specify the connection parameters.

I would take this in another direction - we should be able to point to a callback which will be called and then return parameters. In this case, assuming my linked implementation, we would reference ansible_base.lib.utils.db.get_pg_notify_params (might need to rename this). So that's a static function that returns a dictionary which otherwise would be passed into the config kwarg of this class __init__. This is a nice-looking idea.

@AlanCoding
Copy link
Member Author

Further update on my thinking:

We're looking at introducing a DAB app that would use the dispatcher (this repo) as a dependency, and this has implications on the solution for the connection-from-Django problem of this issue.

https://github.com/ansible/django-ansible-base/compare/devel...AlanCoding:dab_dispatcher?expand=1

This heads us in the direction of a clear separation of dependencies, where this does not make Django a dependency, even of the pg_notify python sub-package. That would close #47, because we're drawing a clear line, and that's on the DAB side of the line.

But wait! We're not done. If we lose Django, we lose its connection management. Even from code reading, you can tell it initializes connections at import time, using thread-local data structures, populated when the connection happens. My only point is that the connection is saved in a global variable. This only truly matters for a problem very unique to myself, which is for writing integration tests for this repo. I desperately need to avoid 2 things which exist as structural elements of the code right now, but reveals in a pain-point of integration tests.

'kwargs': {'config': broker.config, 'new_connection': True},

I need to avoid:

  • passing connection params to the worker
  • creating a new connection when sending a reply message

Creating a new connection penalizes us with an extremely significant delay, and makes race conditions very difficult to address. Indeed, I believe most "flake" I've seen in testing is due to this reason. But if this lib is truly lower-level, and we're not going to make a test_app a dependency of all testing, how can we resolve this? To answer this, I have to slightly backtrack on something I said in a prior comment here. My updated answer is that:

  1. Django apps using the dispatcher will be expected to setup the dispatcher global config in the .ready() method. We will just have to document that interacting outside of Django will fail to connect.
  2. Internally to the dispatcher, and in the default config, we will apply a callback in dispatcher.utils that gets a psycopg connection and saves it to a global variable, like what Django does, but re-implemented in a production unsafe way

This new config (for testing) would look a little bit different than so-far imagined, since it would take both a config and what I'll now call a connection_callback function.

---
brokers:
  PGNotifyBroker:
    config:
      conninfo: dbname=dispatch_db user=dispatch password=dispatching host=localhost port=55777
    connection_callback: dispatcher.brokers.pg_notify.connection_saver

This connection_saver method would be an analog to ansible_base.lib.utils.db.get_pg_notify_params. Both of these methods would take the data from config. The local connection_saver would just save the connection to a global variable to avoid re-connecting for the next task it runs. But in the case of Django, and generally the DAB task app, it will get nothing from config (or empty dict, what have you), ignore that, and get the params from Django. To write it out...

---
brokers:
  PGNotifyBroker:
    connection_callback: ansible_base.lib.utils.db.get_pg_notify_params

This feature of connection-from-Django would only be available for users of the Django app (seems reasonable), and users of the lower-level dispatcher directly would need to somehow configure on their own.

I believe this is a solution I can live with. This is important for completing the test suite, and important for completing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring code build-out and clean-up
Projects
None yet
Development

No branches or pull requests

2 participants