Skip to content

Releases: sorentwo/oban

v2.15.0

13 Apr 12:33
Compare
Choose a tag to compare

🗜️ Notification Compression

Oban uses notifications across most core functionality, from job staging to cancellation. Some notifications, such as gossip, contain massive redundancy that compresses nicely. For example, this table breaks down the compression ratios for a fairly standard gossip payload containing data from ten queues:

Mode Bytes % of Original
Original 4720 100%
Gzip 307 7%
Encode 64 412 9%

Minimizing notification payloads is especially important for Postgres because it applies an 8kb limit to all messages. Now all pub/sub notifications are compressed automatically, with a safety mechanism for compatibility with external notifiers, namely Postgres triggers.

🗃️ Query Improvements

There has been an ongoing issue with systems recording a job attempt twice, when it only executed once. While that sounds minor, it could break an entire queue when the attempt exceeded max attempts because it would violate a database constraint.

Apparently, the Postgres planner may choose to generate a plan that executes a nested loop over the LIMITing subquery, causing more UPDATEs than LIMIT. That could cause unexpected updates, including attempts > max_attempts in some cases. The solution is to use a CTE as an "optimization fence" that forces Postgres not to optimize the query.

We also worked in a few additional query improvements:

  • Use an index only scan for job staging to safely handle tables with millions of scheduled jobs.
  • Remove unnecessary row locking from staging and pruning queries.

🪶 New Engine Callbacks for SQL Compatibility

We're pleased to share improvements in Oban's SQLite integration. A few SQLite pioneers identified pruning and staging compatibility bugs, and instead of simply patching around the issues with conditional logic, we tackled them with new engine callbacks: stage_jobs/3 and prune_jobs/3. The result is safer, optimized queries for each specific database.

Introducing new engine callbacks with database-specific queries paves the way for working with other databases. There's even an open issue for MySQL support...

v2.15.0 — 2023-04-13

Enhancements

  • [Oban] Use DynamicSupervisor to supervise queues for optimal shutdown

    Standard supervisors shut down in a fixed order, which could make shutting down queues with active jobs and a lengthy grace period very slow. This switches to a DynamicSupervisor for queue supervision so queues can shut down simultaneously while still respecting the grace period.

  • [Executor] Retry acking infinitely after job execution

    After jobs execute the producer must record their status in the database. Previously, if acking failed due to a connection error after 10 retries it would orphan the job. Now, acking retries infinitely (with backoff) until the function succeeds. The result is stronger execution guarantees with backpressure during periods of database fragility.

  • [Oban] Accept a Job struct as well as a job id for cancel_job/1 and retry_job/1

    Now it's possible to write Oban.cancel_job(job) directly rather than Oban.cancel_job(job.id).

  • [Worker] Allow snoozing jobs for zero seconds.

    Returning {:snooze, 0} immediately reschedules a job without any delay.

  • [Notifier] Accept arbitrary channel names for notifications, e.g. "my-channel"

  • [Telemetry] Add 'detach_default_logger/0' to programmatically disable an attached logger.

  • [Testing] Avoid unnecessary query for "happy path" assertion errors in assert_enqueued/2

  • [Testing] Inspect charlists as lists in testing assertions

    Args frequently contain lists of integers like [123], which was curiously displayed as '{'.

Bug Fixes

  • [Executor] Correctly raise "unknown worker" errors.

    Unknown workers triggered an unknown case error rather than the appropriate "unknown worker" runtime error.

  • [Testing] Allow assert_enqueued with a scheduled_at time for available jobs

    The use of Job.new to normalize query fields would change assertions with a "scheduled_at" date to only check scheduled, never "available"

  • [Telemetry] Remove :worker from engine and plugin query meta.

    The worker isn't part of any query indexes and prevents optimal index usage.

  • [Job] Correct priority type to cover default of 0

For changes prior to v2.15 see the v2.14 docs.

v2.14.2

17 Feb 18:43
Compare
Choose a tag to compare

Bug Fixes

  • [Oban] Always disable peering with plugins: false. There's no reason to enable peering when plugins are fully disabled.

  • [Notifier] Notify Global peers when the leader terminates.

    Now the Global leader sends a down message to all connected nodes when the process terminates cleanly. This behaviour prevents up to 30s of downtime without a leader and matches how the Postgres peer operates.

  • [Notifier] Allow compilation in a SQLite application when the postgrex package isn't available.

  • [Engine] Include jobs in fetch_jobs event metadata

Changes

  • [Notifier] Pass pid in instead of relying on from for Postgres notifications.

    This prepares Oban for the upcoming Postgrex.SimpleConnection switch to use gen_statem.

v2.14.1

17 Feb 18:43
Compare
Choose a tag to compare

Bug Fixes

  • [Repo] Prevent logging SQL queries by correctly handling default opts

    The query dispatch call included opts in the args list, rather than separately. That passed options to Repo.query correctly, but it missed any default options such as log: false, which made for noisy development logs.

v2.14.0

26 Jan 14:18
Compare
Choose a tag to compare

Time marches on, and we minimally support Elixir 1.12+, PostgreSQL 12+, and SQLite 3.37.0+

🪶 SQLite3 Support with the Lite Engine

Increasingly, developers are choosing SQLite for small to medium-sized projects, not just in the
embedded space where it's had utility for many years. Many of Oban's features, such as isolated
queues, scheduling, cron, unique jobs, and observability, are valuable in smaller or embedded
environments. That's why we've added a new SQLite3 storage engine to bring Oban to smaller,
stand-alone, or embedded environments where PostgreSQL isn't ideal (or possible).

There's frighteningly little configuration needed to run with SQLite3. Migrations, queues, and
plugins all "Just Work™".

To get started, add the ecto_sqlite3 package to your deps and configure Oban to use the
Oban.Engines.Lite engine:

config :my_app, Oban,
  engine: Oban.Engines.Lite,
  queues: [default: 10],
  repo: MyApp.Repo

Presto! Run the migrations, include Oban in your application's supervision tree, and then start
inserting and executing jobs as normal.

⚠️ SQLite3 support is new, and while not experimental, there may be sharp edges. Please report any
issues or gaps in documentation.

👩‍🔬 Smarter Job Fetching

The most common cause of "jobs not processing" is when PubSub isn't available. Our troubleshooting
section instructed people to investigate their PubSub and optionally include the Repeater
plugin. That kind of manual remediation isn't necessary now! Instead, we automatically switch back
to local polling mode when PubSub isn't available—if it is a temporary glitch, then fetching
returns to the optimized global mode after the next health check.

Along with smarter fetching, Stager is no longer a plugin. It wasn't ever really a plugin, as
it's core to Oban's operation, but it was treated as a plugin to simplify configuration and
testing. If you're in the minority that tweaked the staging interval, don't worry, the existing
plugin configuration is automatically translated for backward compatibility. However, if you're a
stickler for avoiding deprecated options, you can switch to the top-level stage_interval:

config :my_app, Oban,
  queues: [default: 10],
- plugins: [{Stager, interval: 5_000}]
+ stage_interval: 5_000

📡 Comprehensive Telemetry Data

Oban has exposed telemetry data that allows you to collect and track metrics about jobs and queues
since the very beginning. Telemetry events followed a job's lifecycle from insertion through
execution. Still, there were holes in the data—it wasn't possible to track the exact state of your
entire Oban system through telemetry data.

Now that's changed. All operations that change job state, whether inserting, deleting, scheduling,
or processing jobs report complete state-change events for every job including queue, state,
and worker details. Even bulk operations such as insert_all_jobs, cancel_all_jobs, and
retry_all_jobs return a subset of fields for all modified jobs, rather than a simple count.

See the 2.14 upgrade guide for step-by-step instructions (all two of them).

Enhancements

  • [Oban] Store a {:cancel, :shutdown} error and emit [:oban, :job, :stop] telemetry when jobs
    are manually cancelled with cancel_job/1 or cancel_all_jobs/1.

  • [Oban] Include "did you mean" suggestions for Oban.start_link/1 and all nested plugins when a
    similar option is available.

    Oban.start_link(rep: MyApp.Repo, queues: [default: 10])
    ** (ArgumentError) unknown option :rep, did you mean :repo?
        (oban 2.14.0-dev) lib/oban/validation.ex:46: Oban.Validation.validate!/2
        (oban 2.14.0-dev) lib/oban/config.ex:88: Oban.Config.new/1
        (oban 2.14.0-dev) lib/oban.ex:227: Oban.start_link/1
        iex:1: (file)
    
  • [Oban] Support scoping queue actions to a particular node.

    In addition to scoping to the current node with :local_only, it is now possible to scope
    pause, resume, scale, start, and stop queues on a single node using the :node
    option.

    Oban.scale_queue(queue: :default, node: "worker.123")
  • [Oban] Remove retry_job/1 and retry_all_jobs/1 restriction around retrying scheduled jobs.

  • [Job] Restrict replace option to specific states when unique job's have a conflict.

    # Replace the scheduled time only if the job is still scheduled
    SomeWorker.new(args, replace: [scheduled: [:schedule_in]], schedule_in: 60)
    
    # Change the args only if the job is still available
    SomeWorker.new(args, replace: [available: [:args]])
  • [Job] Introduce format_attempt/1 helper to standardize error and attempt formatting
    across engines

  • [Repo] Wrap nearly all Ecto.Repo callbacks.

    Now every Ecto.Repo callback, aside from a handful that are only used to manage a Repo
    instance, are wrapped with code generation that omits any typespecs. Slight inconsistencies
    between the wrapper's specs and Ecto.Repo's own specs caused dialyzer failures when nothing
    was genuinely broken. Furthermore, many functions were missing because it was tedious to
    manually define every wrapper function.

  • [Peer] Emit telemetry events for peer leadership elections.

    Both peer modules, Postgres and Global, now emit [:oban, :peer, :election] events during
    leader election. The telemetry meta includes a leader? field for start and stop events to
    indicate if a leadership change took place.

  • [Notifier] Allow passing a single channel to listen/2 rather than a list.

  • [Registry] Add lookup/2 for conveniently fetching registered {pid, value} pairs.

Bug Fixes

  • [Basic] Capture StaleEntryError on unique replace.

    Replacing while a job is updated externally, e.g. it starts executing, could occasionally raise
    an Ecto.StaleEntryError within the Basic engine. Now, that exception is translated into an
    error tuple and bubbles up to the insert call site.

  • [Job] Update t:Oban.Job/0 to indicate timestamp fields are nullable.

Deprecations

  • [Stager] Deprecate the Stager plugin as it's part of the core supervision tree and may be
    configured with the top-level stage_interval option.

  • [Repeater] Deprecate the Repeater plugin as it's no longer necessary with hybrid staging.

  • [Migration] Rename Migrations to Migration, but continue delegating functions for backward
    compatibility.

v2.13.6

28 Nov 18:03
Compare
Choose a tag to compare

Bug Fixes

  • [Testing] Put default timestamps directly in changeset.

    Workers that override new/2 and don't pass options through would end up without necessary timestamps, causing a CaseClauseError during execution when timestamps couldn't be compared.

v2.13.4

23 Sep 14:11
Compare
Choose a tag to compare

Bug Fixes

  • [Oban] Fix dialyzer ambiguity for insert_all/2 when using a custom name rather than options.

  • [Testing] Increment attempt when executing with :inline testing mode

    Inline testing mode neglected incrementing the attempt and left it at 0. That caused jobs with a single attempt to erroneously report failure rather than a discard telemetry event.

  • [Reindexer] Correct namespace reference in reindexer query.

v2.13.3

09 Sep 13:04
Compare
Choose a tag to compare

Bug Fixes

  • [Oban] Fix dialyzer for insert/2 and insert_all/2, again.

    The recent addition of a @spec for Oban.insert/2 broke dialyzer in some situations. To prevent this regression in the future, we now include a compiled module that exercises all Oban.insert function clauses for dialyzer.

v2.13.2

19 Aug 15:15
Compare
Choose a tag to compare

Bug Fixes

  • [Oban] Fix insert/3 and insert_all/3 when using options.

    Multiple default arguments caused a conflict for function calls with options but without an Oban instance name, e.g. Oban.insert(changeset, timeout: 500)

  • [Reindexer] Fix the unused index repair query and correctly report errors.

    Reindexing and deindexing would fail silently because the results weren't checked, and no exceptions were raised.

v2.13.1

10 Aug 00:41
Compare
Choose a tag to compare

Bug Fixes

  • [Oban] Expand insert/insert_all typespecs for multi arity

    This fixes dialyzer issues from the introduction of opts to Oban.insert and Oban.insert_all functions.

  • [Reindexer] Allow specifying timeouts for all queries

    In some cases, applying REINDEX INDEX CONCURRENTLY on the indexes oban_jobs_args_index, and oban_jobs_meta_index takes more than the default value (15 seconds). This new option allows clients to specify other values than the default.

v2.13.0

21 Jul 16:38
Compare
Choose a tag to compare

Cancel Directly from Job Execution

Discard was initially intended to mean "a job exhausted all retries." Later, it was added as a return type for perform/1, and it came to mean either "stop retrying" or "exhausted retries" ambiguously, with no clear way to differentiate. Even later, we introduced cancel with a cancelled state as a way to stop jobs at runtime.

To repair this dichotomy, we're introducing a new {:cancel, reason} return type that transitions jobs to the cancelled state:

case do_some_work(job) do
  {:ok, _result} = ok ->
    ok

  {:error, :invalid} ->
-   {:discard, :invalid}
+   {:cancel, :invalid}

  {:error, _reason} = error ->
    error
end

With this change we're also deprecating the use of discard from perform/1 entirely! The meaning of each action/state is now:

  • cancel—this job was purposefully stopped from retrying, either from a return value or the cancel command triggered by a human

  • discard—this job has exhausted all retries and transitioned by the system

You're encouraged to replace usage of :discard with :cancel throughout your application's workers, but :discard is only soft-deprecated and undocumented now.

Public Engine Behaviour

Engines are responsible for all non-plugin database interaction, from inserting through executing jobs. They're also the intermediate layer that makes Pro's SmartEngine possible.

Along with documenting the Engine this also flattens its name for parity with other "extension" modules. For the sake of consistency with notifiers and peers, the Basic and Inline engines are now Oban.Engines.Basic and Oban.Engines.Inline, respectively.

v2.13.0 — 2022-07-22

Enhancements

  • [Telemetry] Add encode option to make JSON encoding for attach_default_logger/1.

    Now it's possible to use the default logger in applications that prefer structured logging or use a standard JSON log formatter.

  • [Oban] Accept a DateTime for the :with_scheduled option when draining.

    When a DateTime is provided, drains all jobs scheduled up to, and including that point in time.

  • [Oban] Accept extra options for insert/2,4 and insert_all/2,4.

    These are typically the Ecto's standard "Shared Options" such as log and timeout. Other engines, such as Pro's SmartEngine may support additional options.

  • [Repo] Add aggregate/4 wrapper to facilitate aggregates from plugins or other extensions that use Oban.Repo.

Bug Fixes

  • [Oban] Prevent empty maps from matching non-empty maps during uniqueness checks.

  • [Oban] Handle discarded and exhausted states for inline testing mode.

    Previously, returning a :discard tuple or exhausting attempts would cause an error.

  • [Peer] Default leader? check to false on peer timeout.

    Timeouts should be rare, as they're symptoms of application/database overload. If leadership can't be established it's safe to assume an instance isn't leader and log a warning.

  • [Peer] Use node-specific lock requester id for Global peers.

    Occasionally a peer module may hang while establishing leadership. In this case the peer isn't yet a leader, and we can fallback to false.

  • [Config] Validate options only after applying normalizations.

  • [Migrations] Allow any viable prefix in migrations.

  • [Reindexer] Drop invalid Oban indexes before reindexing again.

    Table contention that occurs during concurrent reindexing may leave indexes in an invalid, and unusable state. Those indexes aren't used by Postgres and they take up disk space. Now the Reindexer will drop any invalid indexes before attempting to reindex.

  • [Reindexer] Only concurrently rebuild args and meta GIN indexes.

    The new indexes option can override the reindexed indexes rather than the defaults.

    The other two standard indexes (primary key and compound fields) are BTREE based and not as subject to bloat.

  • [Testing] Fix testing mode for perform_job and alt engines, e.g. Inline

    A couple of changes enabled this compound fix:

    1. Removing the engine override within config and exposing a centralized engine lookup instead.
    2. Controlling post-execution db interaction with a new ack option for the Executor module.

Deprecations

  • [Oban] Soft replace discard with cancel return value (#730) [Parker Selbert]