Skip to content

Releases: sorentwo/oban

v2.17.11

25 Jun 15:25
Compare
Choose a tag to compare

Bug Fixes

  • [Oban] Handle deprecation warnings from Elixir 1.17

  • [Notifier] Prevent noisy logging about switching between modes.

    There's an apparent race condition in Sonar between pruning stale nodes on :ping and updating the status after a notification. This primarily happens in development for two reasons:

    1. Development laptops are most prone to time warp because of system sleep.
    2. Apps only run a single node in development.

    Using monotonic_time/1 instead of system_time/1 guards against clock drift/time warp effects.

  • [Stager] Prevent notification status timeouts from bubbling into the Stager.

    A clogged Ecto pool could cause cascading errors on startup due to a sequence of calls between the Notifier, Sonar, and Stager.

    1. Sonar sends a notification in handle_continue on startup.
    2. The notification is blocked while the Notifier waits for a connection from the Ecto pool.
    3. Stager checks for the connection status on startup, which would eventually time out because the Sonar hadn't finished initializing.
    4. The Stager crashes from the timeout error.

    This makes the following changes to prevent this sequence of events:

    1. The Stager no longer gets the sonar status during startup.
    2. The Notifier catches timeout errors from Sonar checks, warns about it, then returns an :unknown status.
  • [Engine] Defensively check the process dictionary during inline testing.

    Not all processes are guaranteed to return a value for the process dictionary. Sometimes a value was missing during inline testing, which would crash the test.

  • [Basic] Set conflict? flag when encountering a unique advisory lock.

    The conflict? flag wasn't set when inserting a unique job was blocked by an advisory lock. Now the flag is set on either a fetched duplicate, or when the advisory lock is set.

  • [Job] Correct replace_by_state_option type by switching from keyword to tuples.

  • [Config] Correctly type shutdown_grace_period as an integer rather than a timeout.

v2.17.10

25 Jun 15:26
Compare
Choose a tag to compare

Enhancements

  • [Oban] Make all generated functions from use Oban overridable.

    Now the functions generated by use Oban are all marked with defoverridable for extensibility.

Bug Fixes

  • [Testing] Use $callers rather than $ancestors for ancestry tree check.

    We care about Tasks for inline testing checks, not normal supervision tree ancestry. The $callers entry is the appropriate mechanism to find the trail of calling processes:

v2.17.9

20 Apr 11:18
Compare
Choose a tag to compare

Enhancements

  • [Testing] Check process ancestry tree for with_testing_mode override.

    Cascade the with_testing_mode block to nested processes that make use of :$ancestry in the process dictionary, i.e. tasks. Now enqueuing a job within spawned processes like Task.async or Task.async_stream will honor the testing mode specified in with_testing_mode/2.

  • [PG] Support alternative namespacing in PG notifier

    By default, all Oban instances using the same prefix option would receive notifications from each other. Now you can use the namespace option to separate instances that are in the same cluster without changing the prefix.

Bug Fixes

  • [Oban] Restore zero arity version of pause_all_queues/0

    Both pause and resume variants lost their default argument in a refactor that shifted around guard clauses.

  • [Oban] Add :oban_draining to process dict while draining

    The flag marks the test process while draining to give hints to the executor and engines. It fixes an incompatibility between Oban.drain_queue/2 and Pro's Testing.drain_jobs/2.

v2.17.8

08 Apr 17:31
Compare
Choose a tag to compare

Enhancements

  • [Backoff] Backoff retry on DBConnection and Postgrex errors from GenServer calls.

    GenServer calls that result in a ConnectionError or Postgrex.Error should also be caught and retried rather than crashing on the first attempt.

Bug Fixes

  • [Notifier] Check for a live notifier process and propagate notify errors.

    The Notifier.notify/1 spec showed it would always return :ok, but that wasn't the case when the notifier was disconnected or the process was no longer running. Now an error tuple is returned when a notifier process isn't running.

    This situation happened most frequently during shutdown, particularly from external usage of the Notifier like an application or the oban_met package.

    In addition, the errors bubble up through top level Oban functions like scale_queue/1, pause_queue/1, etc. to indicate that the operation can't actually succeed.

  • [Peers.Postgres] Rescue DBConnection.ConnectionError in peer leadership check.

    Previously, only Postgrex.Error exceptions were rescued and other standard connection errors were ignored, crashing the Peer. Because leadership is checked immediately after the peer initializes, any connection issues would trigger a crash loop that could bring down the rest of the supervision tree.

v2.17.7

27 Mar 13:16
Compare
Choose a tag to compare

Bug Fixes

  • [Notifier] Prevent Sonar from running in :testing modes.

    Sonar has no purpose during tests, and it can cause sandbox issues when tests run with the Postgres notifier.

  • [Oban] Correctly handle pause and resume all with opts.

    The primary clause had two default arguments and it was impossible to call pause_all_queues/1 or resume_all_queues/1 with opts and no name.

v2.17.6

02 Mar 10:49
Compare
Choose a tag to compare

Enhancements

  • [Cron] Include cron indicator and original cron expression in meta.

    When the Cron plugin inserts jobs the original, unnormalized cron expression is now stored in a job's meta under the cron_expr key. For compatibility with Pro's DynamicCron, meta also has cron: true injected.

  • [Worker] Change backoff/1 spec to allow immediate rescheduling by returning 0

    The callback now specifies a non_neg_integer to allow retrying 0 seconds into the future. This matches the abilitiy to use {:snooze, 0}.

Bug Fixes

  • [Notifier] Revert using single connection to deliver notices.

    Production use of the single notifier connection in various environments showed timeouts from bottlenecks in the single connection could crash the connection and start new ones, leaving extra idle processes. In addition, Use interpolated NOTIFY instead of pg_notify with a JSON argument added parsing load because it could no longer use prepared statements.

  • [Worker] Apply custom backoff on timeout and unhandled exit

    A worker's custom backoff/1 wasn't applied after a TimeoutError or unhandled exit. That's because the producer stores the executor struct before resolving the worker module. Now the module is resolved again to ensure custom backoff/1 are used.

  • [Job] Revert setting the default priority key in Job schema.

    The Ecto default was removed to allow overriding the default in the database and it was purposefully removed.

v2.17.5

25 Feb 22:50
Compare
Choose a tag to compare

Enhancements

  • [Notifier] Use the Postgres notifier's connection to deliver notifications.

    The Postgres notifier holds a single connection for listening and relaying messages. However, it wasn't used to dispatch messages; that was left to queries through the Ecto pool. Those queries were noisy and put unnecessary load on the pool, particularly from insert notifications.

    Now notifications are delivered through the notifier's connection鈥攖hey don't require a pool checkout, and they won't clutter Ecto logs or telemetry.

  • [Engine] Emit insert trigger notification directly from Engine callbacks.

    Notifications are now sent from the engine, within the insert_* telemetry block, so the timing impact is visible. In addition, notifications aren't emitted for scheduled jobs, as there's nothing ready for producers to fetch.

Bug Fixes

  • [Notifier] Track and compare sonar pings using the correct time unit.

    The notifier's status tracker pruned stale nodes using mismatched time units, causing constant status change events despite nothing changing. This ensures the recorded and compared times are both milliseconds, not a mixture of seconds and native time.

  • [Cron] Retain @reboot cron entries until node becomes leader.

    With rolling deploys it is frequent that a node isn't the leader the first time cron evaluates. However, @reboot expressions were discarded after the first run, which prevented reboots from being inserted when the node acquired leadership.

  • [Oban] Require Ecto v3.10 to support materialized flag added in the previous patch.

    The materialized option wasn't supported by Ecto until v3.10. Compiling with an earlier version causes a compilation error.

v2.17.4

16 Feb 14:40
Compare
Choose a tag to compare

Enhancements

  • [Notifier] Add Notifier.status/1 to expose cluster connectivty status for diagnostics.

    Notifier gains status/1, backed by an internal Sonar process, for access to cluster diagnostics. This will help identify whether the current node can communicate with other nodes for vital functionality such as staging jobs.

    A new [:oban, :notifier, :switch] telemetry event is emitted for diagnostic instrumentation and switch events are displayed by the default logger.

  • [Stager] Use notifier status and leadership checks to switch between global and local mode.

    The sonar-backed notifier status is more accurate than the ad-hoc stager checks previously used to check connectivity. Now the staging mode is determined by a combination of sonar and leadership鈥攊solated follower nodes will correctly switch to :local (polling) mode.

    The new detection mechanism prevents an infrequent and difficult to spot bug where isolated follower nodes wouldn't process jobs.

  • [Backoff] Catch GenServer timeouts in with_retry/2.

    Backoff's with_retry/2 function previously rescued and retried known database exceptions, and it didn't catch exits. Now it also catches timeout errors to accommodate synchronized call based acks like those used in Pro's Smart engine in sync mode.

  • [Stager] Avoid transactions on follower nodes.

    Only the leader needs to execute a transaction while staging. This change prevents pointless empty transactions from each follower node every second. Now there are 60 staging transactions per minute across the whole cluster, rather than 60 from each node.

  • [Notifier] Unify public function type signatures and accept a name or conf in all public functions.

  • [Plugins] Always validate and build state in start_link/1

    Error messages and stack traces are clearer when errors come from start_link/1 rather than init/1, and it avoids starting processes that will immediately crash.

    This also hides child_spec/1 in all public modules to keep it out of documentation.

  • [Basic] Skip taking advisory locks in testing modes.

    Advisory locks are global and apply across transactions. That can break async tests with overlapping unique jobs because the lock is held in a concurrent, sandboxed test.

Bug Fixes

  • [Engine] Ensure uniqueness across args when no keys are specified regardless of insertion order.

  • [Basic] Force materializing the CTE when fetching jobs.

    The CTE used to prevent optimizations in the Basic engine's fetch query is only referenced once, which may allow the Postgres optimizer to inline it. Inlining can negate the CTE "optimization fence", so we force the CTE to be materialized.

  • [Job] Restore default priority in the schema.

    A default was still set in the database, but it was erroneously removed from the Job schema.

  • [Job] Fix unique typespec by unwrapping the field type.

v2.17.3

23 Jan 20:46
Compare
Choose a tag to compare

Enhancements

  • [Stager] Rescue and report staging errors with telemetry

    Staging errors from queue contention or other database issues would cause the top level stager process to crash. Eventually that could shut down the entire Oban supervision tree. Now we rescue standard database connectivity issues instead and report them as errors in telemetry.

  • [Telemetry] Include result in job exception telemetry

    Returning an {:error, reason} tuple triggers an :exception telemetry event, but there's still a return value. Exception events for crashes, raises, timeouts, and kills will have a nil result value.

  • [Queue] Add producer handle_call clause for engine put_meta calls.

    Previously, the only way to put meta was through a non-blocking notification to handle_info.

v2.17.2

11 Jan 16:52
Compare
Choose a tag to compare

Enhancements

  • [Oban] Support passing changeset streams to insert_all.

    Accepting streams makes Oban.insert_all more flexible and may, in some circumstances, make it possible to reduce memory usage for streams of large resources.

Bug Fixes

  • [Config] Validate :repo option without checking for Ecto.Repo behaviour.

    Repo wrappers that don't implement all functions of the Ecto.Repo behaviour are still viable and shouldn't be validated with a behaviour check. This changes repo validation back to the way it was done in older versions, by checking that it's a valid module that exports config/0.

  • [Peer] Handle rollback during Oban.Peers.Postgres peer election

    Infrequently, the postgres peer election transaction returns {:error, :rollback}. Now that return value is handled to prevent a match error.

    The peer maintains its current leader? status on rollback鈥攖his may cause inconsistency if the leader encounters an error and multiple rollbacks happen in sequence. That tradeoff is acceptable because the situation is unlikely and less of an issue than crashing the peer.

  • [Oban] Skip queue existence check for pause_all_queues and resume_all_queues when the local_only option is passed.