Releases: sorentwo/oban
v2.12.1
Bug Fixes
-
[BasicEngine] Never fetch jobs that have reached max attempts
This adds a safeguard to the
fetch_jobs
function to prevent ever hitting theattempt <= max_attempts
check constraint. Hitting the constraint causes the query to fail, which crashes the producer and starts an infinite loop of crashes. The previous commit should prevent this situation from occurring at the "staging" level, but to be absolutely safe this change prevents it at the
"fetching" level too.There is a very minor performance hit from this change because the query can no longer run as an index only scan. For systems with a modest number of available jobs the performance impact is indistinguishable.
-
[Plugins] Prevent unexpectedly modifying jobs selected by subqueries
Most applications don't run at a serializable isolation level. That allows subqueries to run within a transaction without having the conditions rechecked—only predicates on
UPDATE
orDELETE
are re-checked, not on subqueries. That allows a race condition where rows may be updated without another evaluation. -
[Repo] Set
query_opts
inRepo.transaction
options to prevent loggingbegin
andcommit
events in development loggers. -
[BasicEngine] Remove the
ORDER BY
clause from unique queriesThe previous
ORDER BY id DESC
significantly hurts unique query performance when there are a lot of potential jobs to check. The ordering was originally added to make test cases predictable and isn't important for the actual behavior of the unique check.
v2.12.0
Oban v2.12 was dedicated to enriching the testing experience and expanding config, plugin, and queue validation across all environments.
Testing Modes
Testing modes bring a new, vastly improved, way to configure Oban for testing. The new testing
option makes it explicit that Oban should operate in a restricted mode for the given environment.
Behind the scenes, the new testing modes rely on layers of validation within Oban's Config
module. Now production configuration is validated automatically during test runs. Even though queues and plugins aren't started in the test environment, their configuration is still validated.
To switch, stop overriding plugins
and queues
and enable a testing mode in your test.exs
config:
config :my_app, Oban, testing: :manual
Testing in :manual
mode is identical to testing in older versions of Oban: jobs won't run automatically so you can use helpers like assert_enqueued
and execute them manually with Oban.drain_queue/2
.
An alternate :inline
allows Oban to bypass all database interaction and run jobs immediately in the process that enqueued them.
config :my_app, Oban, testing: :inline
Finally, new testing guides cover test setup, unit testing workers, integration testing queues, and testing dynamic configuration.
Global Peer Module
Oban v2.11 introduced centralized leadership via Postgres tables. However, Postgres based leadership isn't always a good fit. For example, an ephemeral leadership mechanism is preferred for integration testing.
In that case, you can make use of the new :global
powered peer module for leadership:
config :my_app, Oban,
peer: Oban.Peers.Global,
...
2.12.0 — 2022-04-21
Enhancements
-
[Oban] Replace queue, plugin, and peer test configuration with a single
:testing
option. Now configuring Oban for testing only requires one change, setting the test mode to either:inline
or:manual
.:inline
—jobs execute immediately within the calling process and without touching the database. This mode is simple and may not be suitable for apps with complex jobs.:manual
—jobs are inserted into the database where they can be verified and executed when desired. This mode is more advanced and trades simplicity for flexibility.
-
[Testing] Add
with_testing_mode/2
to temporarily change testing modes within the context of a function.Once the application starts in a particular testing mode it can't be changed. That's inconvenient if you're running in
:inline
mode and don't want a particular job to execute inline. -
[Config] Add
validate/1
to aid in testing dynamic Oban configuration. -
[Config] Validate full plugin and queue options on init, without the need to start plugins or queues.
-
[Peers.Global] Add an alternate
:global
powered peer module. -
[Plugin] A new
Oban.Plugin
behaviour formalizes starting and validating plugins. The behaviour is implemented by all plugins and is the foundation of enhanced config validation. -
[Plugin] Emit
[:oban, :plugin, :init]
event on init from every plugin.
Bug Fixes
-
[Executor ] Skip timeout check with an unknown worker
When the worker can't be resolved we don't need to check the timeout. Doing so prevents returning a helpful "unknown worker" message and instead causes a function error for
nil.timeout/1
. -
[Testing] Include
log
andprefix
in generated conf forperform_job
.The opts, and subsequent conf, built for
perform_job
didn't include theprefix
orlog
options. That prevented functions that depend on a job'sconf
withinperform/1
from running with the correct options. -
[Drainer] Retain the currently configured engine while draining a queue.
-
[Watchman] Skip pausing queues when shutdown is immediate. This prevents queue's from interacting with the database during short test runs.
v2.11.0
Oban v2.11 focused on reducing database load, bolstering telemetry-powered introspection, and improving the production experience for all users. To that end, we've extracted functionality from Oban Pro and switched to a new global coordination model.
Leadership
Coordination between nodes running Oban is crucial to how many plugins operate. Staging jobs once a second from multiple nodes is wasteful, as is pruning, rescuing, or scheduling cron jobs. Prior Oban versions used transactional advisory locks to prevent plugins from running concurrently, but there were some issues:
-
Plugins don't know if they'll take the advisory lock, so they still need to run a query periodically.
-
Nodes don't usually start simultaneously, and time drifts between machines. There's no guarantee that the top of the minute for one node is the same as another's—chances are, they don't match.
Oban 2.11 introduces a table-based leadership mechanism that guarantees only one node in a cluster, where "cluster" means a bunch of nodes connected to the same Postgres database, will run plugins. Leadership is transparent and designed for resiliency with minimum chatter between nodes.
See the [Upgrade Guide][upg] for instructions on how to create the peers table and get started with leadership. If you're curious about the implementation details or want to use leadership in your application, take a look at docs for Oban.Peer
.
Alternative PG (Process Groups) Notifier
Oban relies heavily on PubSub, and until now it only provided a Postgres adapter. Postres is amazing, and has a highly performant PubSub option, but it doesn't work in every environment (we're looking at you, PG Bouncer).
Fortunately, many Elixir applications run in a cluster connected by distributed Erlang. That means Process Groups, aka PG, is available for many applications.
So, we pulled Oban Pro's PG notifier into Oban to make it available for everyone! If your app runs in a proper cluster, you can switch over to the PG notifier:
config :my_app, Oban,
notifier: Oban.Notifiers.PG,
...
Now there are two notifiers to choose from, each with their own strengths and weaknesses:
-
Oban.Notifiers.Postgres
— Pros: Doesn't require distributed erlang, publishesinsert
events to trigger queues; Cons: Doesn't work with PGBouncer intransaction mode, Doesn't work in tests because of the sandbox. -
Oban.Notifiers.PG
— Pros: Works PG Bouncer in transaction mode, Works in tests; Cons: Requires distributed Erlang, Doesn't publishinsert
events.
Basic Lifeline Plugin
When a queue's producer crashes or a node shuts down before a job finishes executing, the job may be left in an executing
state. The worst part is that these jobs—which we call "orphans"—are completely invisible until you go searching through the jobs table.
Oban Pro has awlays had a "Lifeline" plugin for just this ocassion—and now we've brought a basic Lifeline
plugin to Oban.
To automatically rescue orphaned jobs that are still executing
, include the Oban.Plugins.Lifeline
in your configuration:
config :my_app, Oban,
plugins: [Oban.Plugins.Lifeline],
...
Now the plugin will search and rescue orphans after they've lingered for 60 minutes.
🌟 Note: The Lifeline
plugin may transition jobs that are genuinely executing
and cause duplicate execution. For more accurate rescuing or to rescue jobs that have exhausted retry attempts see the DynamicLifeline
plugin in Oban Pro.
Reindexer Plugin
Over time various Oban indexes (heck, any indexes) may grow without VACUUM
cleaning them up properly. When this happens, rebuilding the indexes will release bloat and free up space in your Postgres instance.
The new Reindexer
plugin makes index maintenance painless and automatic by periodically rebuilding all of your Oban indexes concurrently, without any locks.
By default, reindexing happens once a day at midnight UTC, but it's configurable with a standard cron expression (and timezone).
config :my_app, Oban,
plugins: [Oban.Plugins.Reindexer],
...
See Oban.Plugins.Reindexer
for complete options and implementation details.
Improved Telemetry and Logging
The default telemetry backed logger includes more job fields and metadata about execution. Most notably, the execution state and formatted error reports when jobs fail.
Here's an example of the default output for a successful job:
{
"args":{"action":"OK","ref":1},
"attempt":1,
"duration":4327295,
"event":"job:stop",
"id":123,
"max_attempts":20,
"meta":{},
"queue":"alpha",
"queue_time":3127905,
"source":"oban",
"state":"success",
"tags":[],
"worker":"Oban.Integration.Worker"
}
Now, here's an sample where the job has encountered an error:
{
"attempt": 1,
"duration": 5432,
"error": "** (Oban.PerformError) Oban.Integration.Worker failed with {:error, \"ERROR\"}",
"event": "job:exception",
"state": "failure",
"worker": "Oban.Integration.Worker"
}
2.11.0 — 2022-02-13
Enhancements
-
[Migration] Change the order of fields in the base index used for the primary Oban queries.
The new order is much faster for frequent queries such as scheduled job staging. Check the v2.11 upgrade guide for instructions on swapping the index in existing applications.
-
[Worker] Avoid spawning a separate task for workers that use timeouts.
-
[Engine] Add
insert_job
,insert_all_jobs
,retry_job
, andretry_all_jobs
as required callbacks for all engines. -
[Oban] Raise more informative error messages for missing or malformed plugins.
Now missing plugins have a different error from invalid plugins or invalid options.
-
[Telemetry] Normalize telemetry metadata for all engine operations:
- Include
changeset
forinsert
- Include
changesets
forinsert_all
- Include
job
forcomplete_job
,discard_job
, etc
- Include
-
[Repo] Include
[oban_conf: conf]
intelemetry_options
for all Repo operations.With this change it's possible to differentiate between database calls made by Oban versus the rest of your application.
Bug Fixes
-
[Telemetry] Emit
discard
rather thanerror
events when a job exhausts all retries.Previously
discard_job
was only called for manual discards, i.e., when a job returned:discard
or{:discard, reason}
. Discarding for exhausted attempts was done withinerror_job
in error cases. -
[Cron] Respect the current timezone for
@reboot
jobs. Previously,@reboot
expressions were evaluated on boot without the timezone applied. In that case the expression may not match the calculated time and jobs wouldn't trigger. -
[Cron] Delay CRON evaluation until the next minute after initialization. Now all cron scheduling ocurrs reliably at the top of the minute.
-
[Drainer] Introduce
discard
accumulator for draining results. Now exhausted jobs along with manual discards count as adiscard
rather than afailure
orsuccess
. -
[Oban] Expand changeset wrapper within multi function.
Previously,
Oban.insert_all
could handle a list of changesets, a wrapper map with a:changesets
key, or a function. However, the function had to return a list of changesets rather than a changeset wrapper. This was unexpected and made some multi's awkward. -
[Testing] Preserve
attempted_at/scheduled_at
inperform_job/3
rather than overwriting them with the current time. -
[Oban] Include
false
as a viablequeue
orplugin
option in typespecs
Deprecations
- [Telemetry] Hard deprecate
Telemetry.span/3
, previously it was soft-deprecated.
Removals
- [Telemetry] Remove circuit breaker event documentation because
:circuit
events aren't emitted anymore.
v2.10.1
The previous release, v2.10.0 was immediately retired in favor of this version.
Removed
- [Oban.Telemetry] Remove the customizable prefix for telemetry events in favor of workarounds such as
keep/drop
in Telemetry Metrics.
v2.10.0
Added
-
[Oban.Telemetry] Add customizable prefix for all telemetry events.
For example, a telemetry prefix of
[:my_app, :oban]
would span job start telemetry events as[:my_app, :oban, :job, :start]
. The default is[:oban]
, which matches the existing functionality.
Fixed
-
[Oban.Plugins.Stager] Use the notifier to broadcast inserted and available jobs rather than inlining them into a Postgres query.
With this change the notifier is entirely swappable and there isn't any reason to use the
Repeater
plugin in production. -
[Oban.Plugins.Cron] Validate job options on init.
Providing invalid job args in the cron tab, e.g.
priority: 5
orunique: []
, wasn't caught until runtime. At that point each insert attempt would fail, crashing the plugin. -
[Oban.Queue.Producer] Prevent crashing on exception formatting when a job exits without a stacktrace, most notably with
{:EXIT, pid}
. -
[Oban.Testing] Return invalid results from
perform_job
, rather than always returningnil
. -
[Oban] Validate that a queue exists when controlling or checking locally, e.g. calls to
Oban.check_queue
orOban.scale_queue
. -
[Oban.Telemetry] Use module capture for telemetry logging to prevent warnings.
v2.9.2
v2.9.1
v2.9.0
Optionally Use Meta for Unique Jobs
It's now possible to use the meta
field for unique jobs. Unique jobs have always supported worker
, queue
, and args
fields. That was flexible, but forced applications to put ad-hoc unique values in args
when they should really be in meta
.
The meta
field supports keys
, just like args
. That makes it possible to use highly efficient fingerprint style uniqueness (and possibly drop the index on args
, if desired).
Here's an example of using a single "fingerprint" key in meta
for uniqueness:
defmodule MyApp.FingerprintWorker do
use Oban.Worker, unique: [fields: [:worker, :meta], keys: [:fingerprint]]
@impl Worker
def new(args, opts) do
fingerprint = :erlang.phash2(args)
super(args, Keyword.put(opts, :meta, %{fingerprint: fingerprint}))
end
end
For backward compatiblity meta
isn't included in unique fields
by default.
Expanded Start and Scale Options
After extensive refactoring to queue option management and validation, now it's possible to start and scale queues with all supported options. Previously start/stop functions only supported the limit
option for dynamic scaling, reducing runtime flexibility considerably.
Now it's possible to start a queue in the paused state:
Oban.start_queue(queue: :dynamic, paused: true)
Even better, for apps that use an alternative engine like the SmartEngine from Oban Pro, it's possible to start a dynamic queue with options like global concurrency or rate limiting:
Oban.start_queue(queue: :dynamic, local_limit: 10, global_limit: 50)
All options are also passed through scale_queue
, locally or globally, even allowing you to reconfigure a feature like rate limiting at runtime:
Oban.scale_queue(queue: :dynamic, rate_limit: [allowed: 50, period: 60])
Added
-
[Oban] Add
Oban.cancel_all_jobs/1,2
to cancel multiple jobs at once, within an atomic transaction. The function accepts aJob
query for complete control over which jobs are cancelled. -
[Oban] Add
Oban.retry_all_jobs/1,2
to retry multiple jobs at once, within an atomic transaction. Likecancel_all_jobs
, it accepts a query for fine-grained control. -
[Oban] Add
with_limit
option todrain_queue/2
, which controls the number of jobs that are fetched and executed concurrently. When paired withwith_recursion
this can drastically speed up interdependent job draining, i.e. workflows. -
[Oban.Telemetry] Add telemetry span events for all engine and notifier actions. Now all database operations are covered by spans.
-
[Oban.Migrations] Add
create_schema
option to prevent automatic schema creation in migrations.
Changed
-
[Oban] Consistently include a
:snoozed
count indrain_queue/2
output. Previously the count was only included when there was at least one snoozed job. -
[Oban.Testing] Default to
attempt: 1
forperform_job/3
, as a worker'sperform/1
would never be called withattempt: 0
.
Fixed
-
[Oban.Queue.Supervisor] Change supervisor strategy to
:one_for_all
.Queue supervisors used a
:rest_for_one
strategy, which allowed the task supervisor to keep running when a producer crashed. That allowed duplicate long-lived jobs to run simultaneously, which is a bug in itself, but could also causeattempt > max_attempts
violations. -
[Oban.Plugins.Cron] Start step ranges from the minimum value, rather than for the entire set. Now the range
8-23/4
correctly includes[8, 12, 16, 20]
. -
[Oban.Plugins.Cron] Correcly parse step ranges with a single value, e.g.
0 1/2 * * *
-
[Oban.Telemetry] Comply with
:telemetry.span/3
by exposing errors asreason
in metadata
v2.8.0
Time Unit Scheduling
It's now possible to specify a unit for :schedule_in
, rather than always assuming seconds. This makes it possible to schedule a job using clearer minutes, hours, days, or weeks, e.g. schedule_in: {1, :minute}
or schedule_in: {3, :days}
.
Changed
-
[Oban.Testing] Accept non-map args to
perform_job/3
for compatibility with
overriddenWorker.new/2
callbacks. -
[Oban.Queue.Producer] Include some jitter when scheduling queue refreshes to prevent queues from refreshing simultaneously. In extreme cases, refresh contention could cause producers to crash.
Fixed
-
[Oban.Queue.Executor] Restore logged warnings for unexpected job results by retaining the
safe
flag during normal execution. -
[Oban.Plugins.Gossip] Catch and discard unexpected messages rather than crashing the plugin.
-
[Oban.Testing] Eliminate dialyzer warnings by including
repo
option in theOban.Testing.perform_job/3
spec.
v2.7.2
Fixed
-
[Oban.Plugins.Pruner] Consider
cancelled_at
ordiscarded_at
timestamps when querying prunable jobs. The previous query required anattempted_at
value, even forcancelled
ordiscarded
jobs. If a job was cancelled before it was attempted then it wouldn't ever be pruned. -
[Oban.Plugins.Gossip] Correct exit handling during safe checks. Occasionally, producer checks time out and the previous
catch
block didn't handle exits properly.