Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "how it works" documentation #92

Merged
merged 5 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@
Changelog
#########

next
====

Improvements
------------

* Improve documentation. (`#92 <https://github.com/clokep/celery-batches/pull/92>`_)

Maintenance
-----------

* Drop support for Celery < 5.2. (`#92 <https://github.com/clokep/celery-batches/pull/92>`_)


0.9 (2024-06-03)
================

Expand Down
26 changes: 26 additions & 0 deletions docs/how_it_works.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
How it works
############

celery-batches makes no changes to how tasks are created or sent to the broker,
but operates on the workers to process multiple tasks at once. Exactly how tasks
are processed depends on the configuration, but the below assumes usage of the
default `"prefork" configuration`_ of a celery worker (the explanation doesn't
change significantly if the gevent, eventlet, or threads worker pools are used,
but the math is different).

As background, Celery workers have a "main" process which fetches tasks from the
broker. By default it fetches the ":setting:`worker_prefetch_multiplier` x :setting:`worker_concurrency`"
number of tasks (if available). For example, if the prefetch multiplier is 100 and the
concurrency is 4, it attempts to fetch up to 400 items from the broker's queue.
Once in memory the worker deserializes the messages and runs whatever their
:attr:`~celery.app.task.Task.Strategy` is -- for a normal celery
:class:`~celery.app.task.Task` this passes the tasks to the workers in the
processing pool one at a time. (This is the :func:`~celery.worker.strategy.default` strategy.)

The :class:`~celery_batches.Batches` task provides a different strategy which instructs
the "main" celery worker process to queue tasks in memory until either
the :attr:`~celery_batches.Batches.flush_interval` or :attr:`~celery_batches.Batches.flush_every`
is reached and passes that list of tasks to the worker in the processing pool
together.

.. _"prefork" configuration: https://docs.celeryq.dev/en/stable/userguide/workers.html#concurrency
15 changes: 9 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ Some potential use-cases for batching of task calls include:
* Bulk inserting / updating of data.
* Tasks with expensive setup that can run across a range of arguments.

For the ``Batches`` task to work properly you must configure :setting:`worker_prefetch_multiplier`
to zero, or some value where the final multiplied value is higher than ``flush_every``.
For the :class:`~celery_batches.Batches` task to work properly you must configure
:setting:`worker_prefetch_multiplier` to zero, or some value where the final
multiplied value is higher than :attr:`~celery_batches.Batches.flush_every`.

.. warning::

Expand All @@ -31,8 +32,9 @@ Returning results
#################

It is possible to return a result for each task request by calling ``mark_as_done``
on your results backend. Returning a value from the ``Batches`` task call is only
used to provide values to signals and does not populate into the results backend.
on your results backend. Returning a value from the :class:`~celery_batches.Batches`
task call is only used to provide values to signals and does not populate into the
results backend.

.. note::

Expand All @@ -48,7 +50,7 @@ Retrying tasks
##############

In order to retry a failed task, the task must be re-executed with the original
``task_id``, see the example below:
task :attr:`~celery.worker.request.Request.id`, see the example below:

.. code-block:: python

Expand All @@ -69,7 +71,7 @@ In order to retry a failed task, the task must be re-executed with the original
else:
app.backend.mark_as_done(request.id, response, request=request)

Note that the retried task is still bound by the flush rules of the ``Batches``
Note that the retried task is still bound by the flush rules of the :class:`~celery_batches.Batches`
task, it is used as a lower-bound and will not run *before* that timeout. In the
example above it will run between 10 - 20 seconds from now, assuming no other
tasks are in the queue.
Expand All @@ -79,4 +81,5 @@ tasks are in the queue.

examples
api
how_it_works
history
6 changes: 1 addition & 5 deletions tox.ini
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
[tox]
envlist =
{pypy3,3.8,3.9}-celery{50,51,52,main}-unit,
# Celery 5.2 added support for Python 3.10.
3.10-celery{52,53,main}-unit,
{pypy3,3.8,3.9,3.10}-celery{52,main}-unit,
# Celery 5.3 added support for Python 3.11.
3.11-celery{53,main}-unit,
# Celery 5.4 added support for Python 3.12.
Expand All @@ -24,8 +22,6 @@ python =
[testenv]
deps=
-r{toxinidir}/requirements/test.txt
celery50: celery>=5.0,<5.1
celery51: celery>=5.1,<5.2
celery52: celery>=5.2.0,<5.3
celery53: celery>=5.3.0,<5.4
celery54: celery>=5.4.0,<5.5
Expand Down
Loading