Improvements to the Retry Layer #399

itsyaasir · 2024-08-24T08:52:40Z

Hi, I have seen that in the new tower v0.5 they finally added the Backoff/ExponentialBackoff policy which can be used with the retry layer.

It will be great if this library will be able to integrate this.

Thanks

geofmureithi · 2024-08-25T05:51:00Z

Technically we have first class support for anything from tower.
That said, we need a robust solution that integrates well with different backends as the current approach handles retries in memory.

reneklacan · 2024-12-05T15:51:27Z

In case somebody is looking for a working solution for apalis v0.5 that they can just copy-paste and easily modify:

use anyhow::Result;
use apalis::prelude::*;
use std::time::Duration;
use tokio::time::{sleep, Sleep};
use tower::retry::Policy;

type Req<T> = Request<T>;
type Err = Error;

#[derive(Clone, Debug)]
pub struct BackoffRetryPolicy {
    pub retries: usize,
    pub initial_backoff: Duration,
    pub multiplier: f64,
    pub max_backoff: Duration,
}

impl Default for BackoffRetryPolicy {
    fn default() -> Self {
        Self {
            retries: 25,
            initial_backoff: Duration::from_millis(1000),
            multiplier: 1.5,
            max_backoff: Duration::from_secs(60),
        }
    }
}

impl BackoffRetryPolicy {
    fn backoff_duration(&self, attempt: usize) -> Duration {
        let backoff = self.initial_backoff.as_millis() as f64 * self.multiplier.powi(attempt as i32);
        Duration::from_millis(backoff.min(self.max_backoff.as_millis() as f64) as u64)
    }
}

impl<T, Res> Policy<Req<T>, Res, Err> for BackoffRetryPolicy
where
    T: Clone,
{
    type Future = Sleep;

    fn retry(&mut self, req: &mut Req<T>, result: &mut Result<Res, Err>) -> Option<Self::Future> {
        let ctx = req.get::<Attempt>().cloned().unwrap_or_default();

        match result {
            Ok(_) => None,
            Err(_) if (self.retries - ctx.current() > 0) => {
                let backoff_duration = self.backoff_duration(ctx.current());
                Some(sleep(backoff_duration))
            }
            Err(_) => None,
        }
    }

    fn clone_request(&mut self, req: &Req<T>) -> Option<Req<T>> {
        let mut req = req.clone();
        let value = req
            .get::<Attempt>()
            .cloned()
            .map(|attempt| {
                attempt.increment();
                attempt
            })
            .unwrap_or_default();
        req.insert(value);
        Some(req)
    }
}

and then:

// ...

.layer(RetryLayer::new(BackoffRetryPolicy::default())

// or 

.layer(RetryLayer::new(BackoffRetryPolicy {
    retries: 10,
    initial_backoff: std::time::Duration::from_millis(1000),
    multiplier: 4.0,
    max_backoff: std::time::Duration::from_secs(60),
}))

geofmureithi · 2024-12-06T17:10:46Z

@reneklacan The example provided is outdated, it would only work for v0.5.
The other problem that needs to be resolved is the fact that this happens in memory, meaning that it cannot be gracefully shutdown and it does not update the backend.

reneklacan · 2024-12-07T10:06:35Z

@geofmureithi I updated my comment to mention v0.5 (when I get to upgrading Apalis, will make sure to add a version for v0.6 as well)

@reneklacan The example provided is outdated, it would only work for v0.5.
The other problem that needs to be resolved is the fact that this happens in memory, meaning that it cannot be gracefully shutdown and it does not update the backend.

I realize that based on your first comment in this thread but I mean this one is still better than nothing

Either way, thanks a lot for all the effort you are putting into Apalis.

geofmureithi · 2024-12-07T10:40:41Z

Yeah it's better than nothing. From your approach I got some ideas. Let me try something and see if it works.

reneklacan · 2024-12-30T12:14:43Z

Working version of previously shared retry policy for Apalis v0.6

use anyhow::Result;
use apalis::prelude::*;
use std::time::Duration;
use tokio::time::{sleep, Sleep};
use tower::retry::Policy;

type Req<T, Ctx> = Request<T, Ctx>;
type Err = Error;

#[derive(Clone, Debug)]
pub struct BackoffRetryPolicy {
    pub retries: usize,
    pub initial_backoff: Duration,
    pub multiplier: f64,
    pub max_backoff: Duration,
}

impl Default for BackoffRetryPolicy {
    fn default() -> Self {
        Self {
            retries: 25,
            initial_backoff: Duration::from_millis(1000),
            multiplier: 1.5,
            max_backoff: Duration::from_secs(60),
        }
    }
}

impl BackoffRetryPolicy {
    fn backoff_duration(&self, attempt: usize) -> Duration {
        let backoff = self.initial_backoff.as_millis() as f64 * self.multiplier.powi(attempt as i32);
        Duration::from_millis(backoff.min(self.max_backoff.as_millis() as f64) as u64)
    }
}

impl<T, Res, Ctx> Policy<Req<T, Ctx>, Res, Err> for BackoffRetryPolicy
where
    T: Clone,
    Ctx: Clone,
{
    type Future = Sleep;

    fn retry(&mut self, req: &mut Req<T, Ctx>, result: &mut Result<Res, Err>) -> Option<Self::Future> {
        let attempt = req.parts.attempt.current();

        match result {
            Ok(_) => None,
            Err(_) if (self.retries - attempt > 0) => Some(sleep(self.backoff_duration(attempt))),
            Err(_) => None,
        }
    }

    fn clone_request(&mut self, req: &Req<T, Ctx>) -> Option<Req<T, Ctx>> {
        let req = req.clone();
        req.parts.attempt.increment();
        Some(req)
    }
}

Usage is the same:

// ...

.layer(RetryLayer::new(BackoffRetryPolicy::default())

// or 

.layer(RetryLayer::new(BackoffRetryPolicy {
    retries: 10,
    initial_backoff: std::time::Duration::from_millis(1000),
    multiplier: 4.0,
    max_backoff: std::time::Duration::from_secs(60),
}))

geofmureithi · 2024-12-30T12:50:47Z

@reneklacan I would recommend using req.parts.attempt rather than the .get() as that is not supported for v0.6 going on. It may still be working for backward compatibility.

reneklacan · 2024-12-30T13:26:07Z

@geofmureithi thanks, updated and refactored my example and it feels much cleaner now

geofmureithi · 2024-12-30T14:34:56Z

Awesome!

You can also replace .layer(RetryLayer::new with just .retry

geofmureithi · 2025-01-01T09:40:35Z

Just another reminder, that this does not synchronize the number of attempts with the backend.

geofmureithi mentioned this issue Dec 26, 2024

retry with fallback timer #485

Closed

geofmureithi mentioned this issue Jan 1, 2025

Dynamic Job Restoration on Application Restart #448

Closed

geofmureithi mentioned this issue Jan 11, 2025

Add retry jobs to the end of the queue #497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to the Retry Layer #399

Improvements to the Retry Layer #399

itsyaasir commented Aug 24, 2024

geofmureithi commented Aug 25, 2024 •

edited

Loading

reneklacan commented Dec 5, 2024 •

edited

Loading

geofmureithi commented Dec 6, 2024

reneklacan commented Dec 7, 2024

geofmureithi commented Dec 7, 2024

reneklacan commented Dec 30, 2024 •

edited

Loading

geofmureithi commented Dec 30, 2024

reneklacan commented Dec 30, 2024

geofmureithi commented Dec 30, 2024

geofmureithi commented Jan 1, 2025

Improvements to the Retry Layer #399

Improvements to the Retry Layer #399

Comments

itsyaasir commented Aug 24, 2024

geofmureithi commented Aug 25, 2024 • edited Loading

reneklacan commented Dec 5, 2024 • edited Loading

geofmureithi commented Dec 6, 2024

reneklacan commented Dec 7, 2024

geofmureithi commented Dec 7, 2024

reneklacan commented Dec 30, 2024 • edited Loading

geofmureithi commented Dec 30, 2024

reneklacan commented Dec 30, 2024

geofmureithi commented Dec 30, 2024

geofmureithi commented Jan 1, 2025

geofmureithi commented Aug 25, 2024 •

edited

Loading

reneklacan commented Dec 5, 2024 •

edited

Loading

reneklacan commented Dec 30, 2024 •

edited

Loading