Skip to content

Releases: sorentwo/oban

v2.12.1

26 May 12:22
Compare
Choose a tag to compare

Bug Fixes

  • [BasicEngine] Never fetch jobs that have reached max attempts

    This adds a safeguard to the fetch_jobs function to prevent ever hitting the attempt <= max_attempts check constraint. Hitting the constraint causes the query to fail, which crashes the producer and starts an infinite loop of crashes. The previous commit should prevent this situation from occurring at the "staging" level, but to be absolutely safe this change prevents it at the
    "fetching" level too.

    There is a very minor performance hit from this change because the query can no longer run as an index only scan. For systems with a modest number of available jobs the performance impact is indistinguishable.

  • [Plugins] Prevent unexpectedly modifying jobs selected by subqueries

    Most applications don't run at a serializable isolation level. That allows subqueries to run within a transaction without having the conditions rechecked—only predicates on UPDATE or DELETE are re-checked, not on subqueries. That allows a race condition where rows may be updated without another evaluation.

  • [Repo] Set query_opts in Repo.transaction options to prevent logging begin and commit events in development loggers.

  • [BasicEngine] Remove the ORDER BY clause from unique queries

    The previous ORDER BY id DESC significantly hurts unique query performance when there are a lot of potential jobs to check. The ordering was originally added to make test cases predictable and isn't important for the actual behavior of the unique check.

v2.12.0

21 Apr 22:20
Compare
Choose a tag to compare

Oban v2.12 was dedicated to enriching the testing experience and expanding config, plugin, and queue validation across all environments.

Testing Modes

Testing modes bring a new, vastly improved, way to configure Oban for testing. The new testing option makes it explicit that Oban should operate in a restricted mode for the given environment.

Behind the scenes, the new testing modes rely on layers of validation within Oban's Config module. Now production configuration is validated automatically during test runs. Even though queues and plugins aren't started in the test environment, their configuration is still validated.

To switch, stop overriding plugins and queues and enable a testing mode in your test.exs config:

config :my_app, Oban, testing: :manual

Testing in :manual mode is identical to testing in older versions of Oban: jobs won't run automatically so you can use helpers like assert_enqueued and execute them manually with Oban.drain_queue/2.

An alternate :inline allows Oban to bypass all database interaction and run jobs immediately in the process that enqueued them.

config :my_app, Oban, testing: :inline

Finally, new testing guides cover test setup, unit testing workers, integration testing queues, and testing dynamic configuration.

Global Peer Module

Oban v2.11 introduced centralized leadership via Postgres tables. However, Postgres based leadership isn't always a good fit. For example, an ephemeral leadership mechanism is preferred for integration testing.

In that case, you can make use of the new :global powered peer module for leadership:

config :my_app, Oban,
  peer: Oban.Peers.Global,
  ...

2.12.0 — 2022-04-21

Enhancements

  • [Oban] Replace queue, plugin, and peer test configuration with a single :testing option. Now configuring Oban for testing only requires one change, setting the test mode to either :inline or :manual.

    • :inline—jobs execute immediately within the calling process and without touching the database. This mode is simple and may not be suitable for apps with complex jobs.
    • :manual—jobs are inserted into the database where they can be verified and executed when desired. This mode is more advanced and trades simplicity for flexibility.
  • [Testing] Add with_testing_mode/2 to temporarily change testing modes within the context of a function.

    Once the application starts in a particular testing mode it can't be changed. That's inconvenient if you're running in :inline mode and don't want a particular job to execute inline.

  • [Config] Add validate/1 to aid in testing dynamic Oban configuration.

  • [Config] Validate full plugin and queue options on init, without the need to start plugins or queues.

  • [Peers.Global] Add an alternate :global powered peer module.

  • [Plugin] A new Oban.Plugin behaviour formalizes starting and validating plugins. The behaviour is implemented by all plugins and is the foundation of enhanced config validation.

  • [Plugin] Emit [:oban, :plugin, :init] event on init from every plugin.

Bug Fixes

  • [Executor ] Skip timeout check with an unknown worker

    When the worker can't be resolved we don't need to check the timeout. Doing so prevents returning a helpful "unknown worker" message and instead causes a function error for nil.timeout/1.

  • [Testing] Include log and prefix in generated conf for perform_job.

    The opts, and subsequent conf, built for perform_job didn't include the prefix or log options. That prevented functions that depend on a job's conf within perform/1 from running with the correct options.

  • [Drainer] Retain the currently configured engine while draining a queue.

  • [Watchman] Skip pausing queues when shutdown is immediate. This prevents queue's from interacting with the database during short test runs.

v2.11.0

13 Feb 16:09
Compare
Choose a tag to compare

Oban v2.11 Upgrade Guide

⚠️📓 Oban v2.11 requires a v11 migration, Elixir v1.11+ and Postgres v10.0+

Oban v2.11 focused on reducing database load, bolstering telemetry-powered introspection, and improving the production experience for all users. To that end, we've extracted functionality from Oban Pro and switched to a new global coordination model.

Leadership

Coordination between nodes running Oban is crucial to how many plugins operate. Staging jobs once a second from multiple nodes is wasteful, as is pruning, rescuing, or scheduling cron jobs. Prior Oban versions used transactional advisory locks to prevent plugins from running concurrently, but there were some issues:

  • Plugins don't know if they'll take the advisory lock, so they still need to run a query periodically.

  • Nodes don't usually start simultaneously, and time drifts between machines. There's no guarantee that the top of the minute for one node is the same as another's—chances are, they don't match.

Oban 2.11 introduces a table-based leadership mechanism that guarantees only one node in a cluster, where "cluster" means a bunch of nodes connected to the same Postgres database, will run plugins. Leadership is transparent and designed for resiliency with minimum chatter between nodes.

See the [Upgrade Guide][upg] for instructions on how to create the peers table and get started with leadership. If you're curious about the implementation details or want to use leadership in your application, take a look at docs for Oban.Peer.

Alternative PG (Process Groups) Notifier

Oban relies heavily on PubSub, and until now it only provided a Postgres adapter. Postres is amazing, and has a highly performant PubSub option, but it doesn't work in every environment (we're looking at you, PG Bouncer).

Fortunately, many Elixir applications run in a cluster connected by distributed Erlang. That means Process Groups, aka PG, is available for many applications.

So, we pulled Oban Pro's PG notifier into Oban to make it available for everyone! If your app runs in a proper cluster, you can switch over to the PG notifier:

config :my_app, Oban,
  notifier: Oban.Notifiers.PG,
  ...

Now there are two notifiers to choose from, each with their own strengths and weaknesses:

  • Oban.Notifiers.Postgres — Pros: Doesn't require distributed erlang, publishes insert events to trigger queues; Cons: Doesn't work with PGBouncer intransaction mode, Doesn't work in tests because of the sandbox.

  • Oban.Notifiers.PG — Pros: Works PG Bouncer in transaction mode, Works in tests; Cons: Requires distributed Erlang, Doesn't publish insert events.

Basic Lifeline Plugin

When a queue's producer crashes or a node shuts down before a job finishes executing, the job may be left in an executing state. The worst part is that these jobs—which we call "orphans"—are completely invisible until you go searching through the jobs table.

Oban Pro has awlays had a "Lifeline" plugin for just this ocassion—and now we've brought a basic Lifeline plugin to Oban.

To automatically rescue orphaned jobs that are still executing, include the Oban.Plugins.Lifeline in your configuration:

config :my_app, Oban,
  plugins: [Oban.Plugins.Lifeline],
  ...

Now the plugin will search and rescue orphans after they've lingered for 60 minutes.

🌟 Note: The Lifeline plugin may transition jobs that are genuinely executing and cause duplicate execution. For more accurate rescuing or to rescue jobs that have exhausted retry attempts see the DynamicLifeline plugin in Oban Pro.

Reindexer Plugin

Over time various Oban indexes (heck, any indexes) may grow without VACUUM cleaning them up properly. When this happens, rebuilding the indexes will release bloat and free up space in your Postgres instance.

The new Reindexer plugin makes index maintenance painless and automatic by periodically rebuilding all of your Oban indexes concurrently, without any locks.

By default, reindexing happens once a day at midnight UTC, but it's configurable with a standard cron expression (and timezone).

config :my_app, Oban,
  plugins: [Oban.Plugins.Reindexer],
  ...

See Oban.Plugins.Reindexer for complete options and implementation details.

Improved Telemetry and Logging

The default telemetry backed logger includes more job fields and metadata about execution. Most notably, the execution state and formatted error reports when jobs fail.

Here's an example of the default output for a successful job:

{
  "args":{"action":"OK","ref":1},
  "attempt":1,
  "duration":4327295,
  "event":"job:stop",
  "id":123,
  "max_attempts":20,
  "meta":{},
  "queue":"alpha",
  "queue_time":3127905,
  "source":"oban",
  "state":"success",
  "tags":[],
  "worker":"Oban.Integration.Worker"
}

Now, here's an sample where the job has encountered an error:

{
  "attempt": 1,
  "duration": 5432,
  "error": "** (Oban.PerformError) Oban.Integration.Worker failed with {:error, \"ERROR\"}",
  "event": "job:exception",
  "state": "failure",
  "worker": "Oban.Integration.Worker"
}

2.11.0 — 2022-02-13

Enhancements

  • [Migration] Change the order of fields in the base index used for the primary Oban queries.

    The new order is much faster for frequent queries such as scheduled job staging. Check the v2.11 upgrade guide for instructions on swapping the index in existing applications.

  • [Worker] Avoid spawning a separate task for workers that use timeouts.

  • [Engine] Add insert_job, insert_all_jobs, retry_job, and retry_all_jobs as required callbacks for all engines.

  • [Oban] Raise more informative error messages for missing or malformed plugins.

    Now missing plugins have a different error from invalid plugins or invalid options.

  • [Telemetry] Normalize telemetry metadata for all engine operations:

    • Include changeset for insert
    • Include changesets for insert_all
    • Include job for complete_job, discard_job, etc
  • [Repo] Include [oban_conf: conf] in telemetry_options for all Repo operations.

    With this change it's possible to differentiate between database calls made by Oban versus the rest of your application.

Bug Fixes

  • [Telemetry] Emit discard rather than error events when a job exhausts all retries.

    Previously discard_job was only called for manual discards, i.e., when a job returned :discard or {:discard, reason}. Discarding for exhausted attempts was done within error_job in error cases.

  • [Cron] Respect the current timezone for @reboot jobs. Previously, @reboot expressions were evaluated on boot without the timezone applied. In that case the expression may not match the calculated time and jobs wouldn't trigger.

  • [Cron] Delay CRON evaluation until the next minute after initialization. Now all cron scheduling ocurrs reliably at the top of the minute.

  • [Drainer] Introduce discard accumulator for draining results. Now exhausted jobs along with manual discards count as a discard rather than a failure or success.

  • [Oban] Expand changeset wrapper within multi function.

    Previously, Oban.insert_all could handle a list of changesets, a wrapper map with a :changesets key, or a function. However, the function had to return a list of changesets rather than a changeset wrapper. This was unexpected and made some multi's awkward.

  • [Testing] Preserve attempted_at/scheduled_at in perform_job/3 rather than overwriting them with the current time.

  • [Oban] Include false as a viable queue or plugin option in typespecs

Deprecations

  • [Telemetry] Hard deprecate Telemetry.span/3, previously it was soft-deprecated.

Removals

  • [Telemetry] Remove circuit breaker event documentation because :circuit events aren't emitted anymore.

v2.10.1

09 Nov 22:17
Compare
Choose a tag to compare

The previous release, v2.10.0 was immediately retired in favor of this version.

Removed

  • [Oban.Telemetry] Remove the customizable prefix for telemetry events in favor of workarounds such as keep/drop in Telemetry Metrics.

v2.10.0

09 Nov 21:22
Compare
Choose a tag to compare

Added

  • [Oban.Telemetry] Add customizable prefix for all telemetry events.

    For example, a telemetry prefix of [:my_app, :oban] would span job start telemetry events as [:my_app, :oban, :job, :start]. The default is [:oban], which matches the existing functionality.

Fixed

  • [Oban.Plugins.Stager] Use the notifier to broadcast inserted and available jobs rather than inlining them into a Postgres query.

    With this change the notifier is entirely swappable and there isn't any reason to use the Repeater plugin in production.

  • [Oban.Plugins.Cron] Validate job options on init.

    Providing invalid job args in the cron tab, e.g. priority: 5 or unique: [], wasn't caught until runtime. At that point each insert attempt would fail, crashing the plugin.

  • [Oban.Queue.Producer] Prevent crashing on exception formatting when a job exits without a stacktrace, most notably with {:EXIT, pid}.

  • [Oban.Testing] Return invalid results from perform_job, rather than always returning nil.

  • [Oban] Validate that a queue exists when controlling or checking locally, e.g. calls to Oban.check_queue or Oban.scale_queue.

  • [Oban.Telemetry] Use module capture for telemetry logging to prevent warnings.

v2.9.2

28 Sep 01:25
Compare
Choose a tag to compare
  • [Oban] Loosen telemetry requirement to allow either 0.4 or 1.0 without forcing apps to use an override.

v2.9.1

28 Sep 01:25
Compare
Choose a tag to compare

Fixed

  • [Oban] Correctly handle prefix in cancel_job and cancel_all_jobs.

  • [Oban] Safely guard against empty changeset lits passed to insert_all/2,4.

v2.9.0

28 Sep 01:24
Compare
Choose a tag to compare

Optionally Use Meta for Unique Jobs

It's now possible to use the meta field for unique jobs. Unique jobs have always supported worker, queue, and args fields. That was flexible, but forced applications to put ad-hoc unique values in args when they should really be in meta.

The meta field supports keys, just like args. That makes it possible to use highly efficient fingerprint style uniqueness (and possibly drop the index on args, if desired).

Here's an example of using a single "fingerprint" key in meta for uniqueness:

defmodule MyApp.FingerprintWorker do
  use Oban.Worker, unique: [fields: [:worker, :meta], keys: [:fingerprint]]

  @impl Worker
  def new(args, opts) do
    fingerprint = :erlang.phash2(args)

    super(args, Keyword.put(opts, :meta, %{fingerprint: fingerprint}))
  end
end

For backward compatiblity meta isn't included in unique fields by default.

Expanded Start and Scale Options

After extensive refactoring to queue option management and validation, now it's possible to start and scale queues with all supported options. Previously start/stop functions only supported the limit option for dynamic scaling, reducing runtime flexibility considerably.

Now it's possible to start a queue in the paused state:

Oban.start_queue(queue: :dynamic, paused: true)

Even better, for apps that use an alternative engine like the SmartEngine from Oban Pro, it's possible to start a dynamic queue with options like global concurrency or rate limiting:

Oban.start_queue(queue: :dynamic, local_limit: 10, global_limit: 50)

All options are also passed through scale_queue, locally or globally, even allowing you to reconfigure a feature like rate limiting at runtime:

Oban.scale_queue(queue: :dynamic, rate_limit: [allowed: 50, period: 60])

Added

  • [Oban] Add Oban.cancel_all_jobs/1,2 to cancel multiple jobs at once, within an atomic transaction. The function accepts a Job query for complete control over which jobs are cancelled.

  • [Oban] Add Oban.retry_all_jobs/1,2 to retry multiple jobs at once, within an atomic transaction. Like cancel_all_jobs, it accepts a query for fine-grained control.

  • [Oban] Add with_limit option to drain_queue/2, which controls the number of jobs that are fetched and executed concurrently. When paired with with_recursion this can drastically speed up interdependent job draining, i.e. workflows.

  • [Oban.Telemetry] Add telemetry span events for all engine and notifier actions. Now all database operations are covered by spans.

  • [Oban.Migrations] Add create_schema option to prevent automatic schema creation in migrations.

Changed

  • [Oban] Consistently include a :snoozed count in drain_queue/2 output. Previously the count was only included when there was at least one snoozed job.

  • [Oban.Testing] Default to attempt: 1 for perform_job/3, as a worker's perform/1 would never be called with attempt: 0.

Fixed

  • [Oban.Queue.Supervisor] Change supervisor strategy to :one_for_all.

    Queue supervisors used a :rest_for_one strategy, which allowed the task supervisor to keep running when a producer crashed. That allowed duplicate long-lived jobs to run simultaneously, which is a bug in itself, but could also cause attempt > max_attempts violations.

  • [Oban.Plugins.Cron] Start step ranges from the minimum value, rather than for the entire set. Now the range 8-23/4 correctly includes [8, 12, 16, 20].

  • [Oban.Plugins.Cron] Correcly parse step ranges with a single value, e.g. 0 1/2 * * *

  • [Oban.Telemetry] Comply with :telemetry.span/3 by exposing errors as reason in metadata

v2.8.0

03 Aug 19:50
Compare
Choose a tag to compare

Time Unit Scheduling

It's now possible to specify a unit for :schedule_in, rather than always assuming seconds. This makes it possible to schedule a job using clearer minutes, hours, days, or weeks, e.g. schedule_in: {1, :minute} or schedule_in: {3, :days}.

Changed

  • [Oban.Testing] Accept non-map args to perform_job/3 for compatibility with
    overridden Worker.new/2 callbacks.

  • [Oban.Queue.Producer] Include some jitter when scheduling queue refreshes to prevent queues from refreshing simultaneously. In extreme cases, refresh contention could cause producers to crash.

Fixed

  • [Oban.Queue.Executor] Restore logged warnings for unexpected job results by retaining the safe flag during normal execution.

  • [Oban.Plugins.Gossip] Catch and discard unexpected messages rather than crashing the plugin.

  • [Oban.Testing] Eliminate dialyzer warnings by including repo option in the Oban.Testing.perform_job/3 spec.

v2.7.2

03 Aug 19:51
Compare
Choose a tag to compare

Fixed

  • [Oban.Plugins.Pruner] Consider cancelled_at or discarded_at timestamps when querying prunable jobs. The previous query required an attempted_at value, even for cancelled or discarded jobs. If a job was cancelled before it was attempted then it wouldn't ever be pruned.

  • [Oban.Plugins.Gossip] Correct exit handling during safe checks. Occasionally, producer checks time out and the previous catch block didn't handle exits properly.