Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds background job scheduling and execution #10906

Closed
wants to merge 217 commits into from
Closed

Conversation

cannikin
Copy link
Member

@cannikin cannikin commented Jul 2, 2024

This new package provides scheduling and processing of background jobs. We want everything needed to run a modern web application to be included in Redwood itself—you shouldn't need any third party integrations if you don't want. Background jobs have been sorely missed, but the time has come! (If you do want to use a third party service we have had an integration with Inngest since May of 2023!)

What's Included

  • A base RedwoodJob class from which your own custom jobs will extend. You only need to fill out the details of a single perform() action, accepting whatever arguments you want, and the underlying RedwoodJob code will take care of scheduling, delaying, running, and, in the case your job fails, recording the error and rescheduling in the future for a retry.
  • Backend adapters for storing your jobs. Today we're shipping with a PrismaAdapter but we also provide a BaseAdapter from which you can extend and build your own.
  • A persistent process to watch for new jobs and execute them. It can be run in dev mode, which stays attached to your console so you can monitor and execute jobs in development, or in daemon mode which detaches from the console and runs in the background forever (you'll use this mode in production).

Decoupling the jobs from their backends means you can swap out backends as your app grows, or even use different backends for different jobs!

The actual Worker and Executor classes that know how to find a job and work on it are self-contained, so you can write your own runner if you want.

Features

  • Named queues: you can schedule jobs in separate named queues and have a different number of workers monitoring each one—makes it much easier to scale your background processing
  • Priority: give your jobs a priority from 1 (highest) to 100 (lowest). Workers will sort available jobs by priority, working the most important ones first.
  • Configurable delay: run your job as soon as possible (default), wait a number of seconds before running, or run at a specific time in the future
  • Run inline: instead of scheduling to run in the background, run immediately
  • Auto-retries with backoff: if your job fails it will back off at the rate of attempts ** 4 for a default of 24 tries, the time between the last two attempts is a little over three days. The number of max retries is configurable per job.
  • Integrates with Redwood's logger: use your existing one in api/src/lib/logger or create a new one just for job logging

How it Works

Using the PrismaAdapter means your jobs are stored in your database. The yarn rw setup jobs script will add a BackgroundJob model in your schema.prisma file. Any job that is invoked with .performLater() will add a row to this table:

WelcomeEmailJob.performLater({ user.email })

If using the PrismaAdapter, any arguments you want to give to your job must be serializable as JSON since the values will be stored in the database as text.

The persistent job workers (started in dev with yarn rw jobs work or detached to run in the background with yarn rw jobs start) will periodically check the database for any jobs that are qualified to run: not already locked by another worker and with a runAt time before or equal to right now. They'll lock the record, instantiate your job class and call perform() on it, passing in the arguments you gave when scheduling it.

  • If the job succeeds it is removed from the database
  • If the job fails the error is recorded, the job is rescheduled to try again, and the lock is removed

Repeat until the queue is empty!

Usage

Setup

To simplify the setup, run the included setup script:

yarn rw setup jobs

This creates api/src/lib/jobs with the basic config included to get up and running, as well as the model added to your schema.prisma file.

You can generate a job with the shell ready to go:

yarn rw g job WelcomeEmail

This creates a file at api/src/jobs/WelcomeEmailJob.js along with the shell of your job. All you need to is fill out the perform() function:

// api/src/jobs/WelcomeEmailJob.js

export class WelcomeEmailJob extends RedwoodJob {
  perform(email) {
    // send email...
  }
}

Scheduling

A typical place you'd use this job would be in a service. In this case, let's add it to the users service after creating a user:

// api/src/services/users/users.js

export const createUser = async ({ input }) {
  const user = await db.user.create({ data: input })
  await WelcomeEmailJob.performLater(user.email)
  return user
})

With the above syntax your job will run as soon as possible, in the queue named "default" and with a priority of 50. You can also delay your job for, say, 5 minutes:

OnboardingJob.set({ wait: 300 }).performLater(user.email)

Or run it at a specific time in the future:

MilleniumReminderJob.set({ waitUntil: new Date(2999, 11, 31, 12, 0, 0) }).performLater(user.email)

There are lots of ways to customize the scheduling and worker processes. Check out the docs for the full list!

Execution

To run your jobs, start up the runner:

yarn rw jobs work

This process will stay attached the console and continually look for new jobs and execute them as they are found. To work on whatever outstanding jobs there are and then exit, use the workoff mode instead.

To run the worker(s) in the background, use the start mode:

yarn rw jobs start

To stop them:

yarn rw jobs stop

You can start more than one worker by passing the -n flag:

yarn rw jobs start -n 4

If you want to specify that some workers only work on certain named queues:

yarn rw jobs start -n default:2,email:1

Make sure you pass the same flags to the stop process as the start so it knows which ones to stop. You can restart your workers as well.

In production you'll want to hook the workers up to a process monitor as, just like with any other process, they could die unexpectedly. More on this in the docs.

The Future

  • More adapters (maybe the community wants to get involved?): Redis, SQS, RabbitMQ
  • Studio integration: monitor the state of your outstanding jobs
  • Baremetal integration: if jobs are enabled, monitor the workers
  • Recurring jobs (performEvery?)
  • Lifecycle hooks: beforePerform(), afterPerform(), afterSuccess(), afterFailure()

await this.adapter.success({
job: this.job,
deleteJob: DEFAULT_DELETE_SUCCESSFUL_JOBS,
})
} catch (error: any) {
this.logger.error(`Error in job ${this.job.id}: ${error.message}`)
// TODO(jgmw): Handle the error 'any' better
this.logger.error(`Error in job ${this.jobId}: ${error.message}`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted this to be the job's actual ID so you can find it in the database...it was very common for me to see an error in the logs and then go find the job to get more info.

I do like the idea of having the name/path output in the logs though. I'd probably use a different name than jobId because I'd assume that's just the integer ID...jobName or jobPath or something? Kind of weird that we have job.name and job.path as well. fullJobName, jobResourceId, jobLocation ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made this change was because we haven't enforced that the general BaseJob must have an ID. That's only available from the prisma job right now. I agree this is less actionable when you see it in the terminal now but I didn't want to go and add the constraint that any job must have a numberic id without discussing it with you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of weird that we have job.name and job.path as well.

Not sure I follow what you mean by weird that these are here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow what you mean by weird that these are here?

haha Sorry, I meant if we added this.jobName but there's also this.job.name it could be confusing, just like seeing this.jobId I assumed would be the same value as this.job.id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made this change was because we haven't enforced that the general BaseJob must have an ID. That's only available from the prisma job right now. I agree this is less actionable when you see it in the terminal now but I didn't want to go and add the constraint that any job must have a numberic id without discussing it with you.

Good point! I should have added id to the required fields in BaseJob I think, otherwise there isn't enough info between name, path, and args to uniquely identify it.

I don't think we can force it be only numeric though, some people like uuids and guids for their id fields. GROSS

@cannikin cannikin closed this Aug 13, 2024
@cannikin
Copy link
Member Author

Closing this one as most of the conversation is now irrelevant after the refactor. Opened a new one: #11238

@Josh-Walker-GM Josh-Walker-GM removed this from the next-release milestone Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixture-ok Override the test project fixture check release:feature This PR introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants