Skip to content

Batch Jobs

akshat edited this page Sep 18, 2024 · 4 revisions

Sometimes, a task involves working on multiple units. Instead of performing these tasks sequentially, executing them in parallel can speed up the process.
Goose provides Batch Jobs to enqueue a collection of Jobs, execute them in parallel, and track them as a single entity.
Batching helps reduce the time it takes to complete a large task and can help build a complex workflow by feature of Callbacks which trigger other tasks/batches upon its completion.

Creating a Batch

Follow these steps to configure a batch:

  • execute-fn-sym: The function to be executed in parallel, and tracked upon completion.
  • args-coll: A sequential collection of args, which must be represented as a sequential collection too.
    • This collection is iterated upon for creating Batch-Jobs.
    • Number of Jobs in a Batch is equal to the number of elements in args-coll.
    • Example: [[1] [2] [:foo :bar] [{:some :map}]]
  • :callback-fn-sym: A fully-qualified function symbol to report status of a batch.
    • Takes batch-id and status as input.
  • :linger-sec: Number of seconds batch metadata will be preserved in message broker, after a batch has reached completion.
  • client-opts: Job queue, retry-opts and broker must be configured here.

Nuances

  1. Ordering of Job-execution within a batch cannot be guaranteed since Jobs will be executed parallelly on different workers.
  2. Batch-callback is enqueued, executed and retried in same way as a Batch-Job.
  3. If retry-queue is different from execution-queue, a worker instance must subscribe to the retry queue for executing failed jobs.
  4. Batch-Jobs API has features to fetch status of a batch; and delete a batch.
  5. When a Batch is deleted, all enqueued and retrying Jobs will be deleted. However, some Jobs might be executed by worker before they can be deleted. Users must account for such edge-cases.

Usage

(ns batch-jobs
  (:require [goose.batch :as batch]
            [goose.client :as c]

            [clojure.tools.logging :as log]))

(defn send-emails
  [email-id]
  (log/infof "Sending email to: %s" email-id))

(defn multi-arity-fn
  [arg1 arg2 & args]
  (log/info "Received args:" arg1 arg2 args))

(defn my-callback
  [batch-id status]
  (condp = status
    batch/status-success (log/infof "Batch: %s successful." batch-id)
    batch/status-dead (log/infof "Batch: %s dead." batch-id)
    batch/status-partial-success (log/infof "Batch: %s partially successful." batch-id)))

(let [batch-opts {:callback-fn-sym `my-callback
                  :linger-sec      86400}
      ;; For single-arity functions
      email-ids ["[email protected]" "[email protected]" "[email protected]"]
      email-args-coll (map list email-ids)
      ;; Use Goose's utility function to construct args-coll
      ;; for multi-arity or variadic functions.
      multi-args-coll (-> []
                    (batch/construct-args :foo :bar :baz)
                    (batch/construct-args :fizz :buzz))]

  (c/perform-batch client-opts batch-opts `send-emails email-args-coll)
  (c/perform-batch client-opts batch-opts `multi-arity-fn multi-args-coll))

Previous: Scheduled Jobs        Next: Cron Jobs

Clone this wiki locally