slugify no regex #191

UnknownPlatypus · 2025-11-16T16:14:47Z

This is mostly me babbling around with criterion but is a 2-5x speed improvement on the slugify util by avoiding regexes and iterating over the characters.

i had to enable crate-type = [..., "rlib"] to be able to import the function in the benchmark script.
This has a minimal cost on dev builds and is ignored in the pyo3 wheel so probably fine.

You can run cargo bench --bench slugify_bench to try it. The table summary was generated by passing the raw output to claude. There are also builtin html report but I wanted something more minimal

Benchmark results

//! Run with: `cargo bench --bench slugify_bench`
//!
//! ## Latest Results (AMD Ryzen 9 7950X)
//! The new version is 1.9x to 5.1x faster than the regex-based implementation
//!
//! | Test Case         | Old (regex) | New (char-iterator) | Speedup  |
//! |-------------------|-------------|---------------------|----------|
//! | Simple ASCII      | 211.09 ns   | 77.85 ns            | 2.71x    |
//! | With Numbers      | 274.38 ns   | 119.71 ns           | 2.29x    |
//! | Mixed Case        | 271.69 ns   | 95.61 ns            | 2.84x    |
//! | With Punctuation  | 469.55 ns   | 166.36 ns           | 2.82x    |
//! | Multiple Spaces   | 328.72 ns   | 140.45 ns           | 2.34x    |
//! | With Hyphens      | 263.26 ns   | 107.48 ns           | 2.45x    |
//! | Unicode Accents   | 291.46 ns   | 130.20 ns           | 2.24x    |
//! | Long Text         | 1,271.2 ns  | 660.34 ns           | 1.93x    |
//! | Special Chars     | 946.20 ns   | 185.09 ns           | 5.11x    |
//! | Mixed Unicode     | 340.86 ns   | 141.90 ns           | 2.40x    |

Benchmark script

use criterion::{Criterion, criterion_group, criterion_main};
use django_rusty_templates::render::filters::slugify;
use regex::Regex;
use std::borrow::Cow;
use std::hint::black_box;
use std::sync::LazyLock;
use unicode_normalization::UnicodeNormalization;

// Old regex-based implementation
static NON_WORD_RE: LazyLock<Regex> =
    LazyLock::new(|| Regex::new(r"[^\w\s-]").expect("Static string will never panic"));

static WHITESPACE_RE: LazyLock<Regex> =
    LazyLock::new(|| Regex::new(r"[-\s]+").expect("Static string will never panic"));

fn slugify_old(content: Cow<str>) -> Cow<str> {
    let content = content
        .nfkd()
        // first decomposing characters, then only keeping
        // the ascii ones, filtering out diacritics for example.
        .filter(|c| c.is_ascii())
        .collect::<String>()
        .to_lowercase();
    let content = NON_WORD_RE.replace_all(&content, "");
    let content = content.trim();
    let content = WHITESPACE_RE.replace_all(content, "-");
    Cow::Owned(content.to_string())
}

fn benchmark_slugify(c: &mut Criterion) {
    let test_cases = vec![
        ("Simple ASCII", "Hello World"),
        ("With Numbers", "Test123 Example456"),
        ("Mixed Case", "ThIs Is A TeSt"),
        ("With Punctuation", "Hello, World! How are you?"),
        ("Multiple Spaces", "Hello    World    Test"),
        ("With Hyphens", "Hello-World-Test"),
        ("Unicode Accents", "Héllo Wörld Tëst"),
        (
            "Long Text",
            "This is a much longer text that contains multiple words and should test the performance with larger inputs",
        ),
        ("Special Chars", "Test@#$%^&*()_+={}[]|\\:;\"'<>,.?/"),
        ("Mixed Unicode", "Café résumé naïve"),
    ];

    let mut group = c.benchmark_group("slugify");

    for (name, input) in test_cases.iter() {
        group.bench_function(format!("old/{}", name), |b| {
            b.iter(|| slugify_old(black_box(Cow::Borrowed(input))))
        });

        group.bench_function(format!("new/{}", name), |b| {
            b.iter(|| slugify(black_box(Cow::Borrowed(input))))
        });
    }

    group.finish();
}

criterion_group!(benches, benchmark_slugify);
criterion_main!(benches);

codecov · 2025-11-16T16:16:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2025-11-16T16:19:14Z

Django test suite conformance

✅ no changes detected running Django test suite.

Django test suite passing: 33.21%
1188 ERROR / 97 FAIL / 639 OK

LilyFirefly · 2025-11-16T16:52:06Z

Thanks for looking into benchmarking - this is definitely something I want to have and to do well. I've not been prioritising it because I want to get feature complete first and want to avoid premature optimisation, but I think this is a good place to optimise anyway.

I would like us to use codspeed for tracking benchmarks in CI - it's used by PyO3 and I've used it well at a previous job too.

Some relevant docs:

src/render/filters.rs

LilyFirefly · 2025-11-16T17:26:39Z

benches/slugify_bench.rs

+// Old regex-based implementation
+static NON_WORD_RE: LazyLock<Regex> =
+    LazyLock::new(|| Regex::new(r"[^\w\s-]").expect("Static string will never panic"));
+
+static WHITESPACE_RE: LazyLock<Regex> =
+    LazyLock::new(|| Regex::new(r"[-\s]+").expect("Static string will never panic"));
+
+fn slugify_old(content: Cow<str>) -> Cow<str> {
+    let content = content
+        .nfkd()
+        // first decomposing characters, then only keeping
+        // the ascii ones, filtering out diacritics for example.
+        .filter(|c| c.is_ascii())
+        .collect::<String>()
+        .to_lowercase();
+    let content = NON_WORD_RE.replace_all(&content, "");
+    let content = content.trim();
+    let content = WHITESPACE_RE.replace_all(content, "-");
+    Cow::Owned(content.to_string())
+}


If we get benchmarking set up nicely in CI, I don't think there's any value in keeping the old implementation around.

Yes agree (and we need to drop it to remove the regex dep)

UnknownPlatypus · 2025-11-16T18:08:07Z

Thanks for looking into benchmarking - this is definitely something I want to have and to do well. I've not been prioritising it because I want to get feature complete first and want to avoid premature optimisation, but I think this is a good place to optimise anyway.
I would like us to use codspeed for tracking benchmarks in CI - it's used by PyO3 and I've used it well at a previous job too.

Yes I'm also interested in codspeed but I agree this is probably not the priority right now.

And the benchmark for this specific function is maybe too niche.
It was a great help when I was optimizing this routine but I'm not sure it's worth running continuously in CI. More of a one-off benchmark to ensure this change make sense.

I've added the bench result and script in the PR description, given that codespeed seem to recommend divan instead of criterion, maybe I can drop the bench script from the PR code and we can revisit when we add codespeed ?

slugify no regex

db034bf

LilyFirefly reviewed Nov 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

slugify no regex #191

slugify no regex #191

Uh oh!

UnknownPlatypus commented Nov 16, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 16, 2025

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

LilyFirefly commented Nov 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

LilyFirefly Nov 16, 2025

Uh oh!

UnknownPlatypus Nov 16, 2025

Uh oh!

UnknownPlatypus commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

slugify no regex #191

Are you sure you want to change the base?

slugify no regex #191

Uh oh!

Conversation

UnknownPlatypus commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results

Uh oh!

codecov bot commented Nov 16, 2025

Codecov Report

Uh oh!

github-actions bot commented Nov 16, 2025

Django test suite conformance

Uh oh!

LilyFirefly commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

LilyFirefly Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

UnknownPlatypus Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

UnknownPlatypus commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

UnknownPlatypus commented Nov 16, 2025 •

edited

Loading

LilyFirefly commented Nov 16, 2025 •

edited

Loading