-
Notifications
You must be signed in to change notification settings - Fork 17
slugify no regex #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
slugify no regex #191
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Django test suite conformance✅ no changes detected running Django test suite. Django test suite passing: 33.21%
1188 ERROR / 97 FAIL / 639 OK |
|
Thanks for looking into benchmarking - this is definitely something I want to have and to do well. I've not been prioritising it because I want to get feature complete first and want to avoid premature optimisation, but I think this is a good place to optimise anyway. I would like us to use codspeed for tracking benchmarks in CI - it's used by PyO3 and I've used it well at a previous job too. Some relevant docs: |
| // Old regex-based implementation | ||
| static NON_WORD_RE: LazyLock<Regex> = | ||
| LazyLock::new(|| Regex::new(r"[^\w\s-]").expect("Static string will never panic")); | ||
|
|
||
| static WHITESPACE_RE: LazyLock<Regex> = | ||
| LazyLock::new(|| Regex::new(r"[-\s]+").expect("Static string will never panic")); | ||
|
|
||
| fn slugify_old(content: Cow<str>) -> Cow<str> { | ||
| let content = content | ||
| .nfkd() | ||
| // first decomposing characters, then only keeping | ||
| // the ascii ones, filtering out diacritics for example. | ||
| .filter(|c| c.is_ascii()) | ||
| .collect::<String>() | ||
| .to_lowercase(); | ||
| let content = NON_WORD_RE.replace_all(&content, ""); | ||
| let content = content.trim(); | ||
| let content = WHITESPACE_RE.replace_all(content, "-"); | ||
| Cow::Owned(content.to_string()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get benchmarking set up nicely in CI, I don't think there's any value in keeping the old implementation around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agree (and we need to drop it to remove the regex dep)
Yes I'm also interested in codspeed but I agree this is probably not the priority right now. And the benchmark for this specific function is maybe too niche. I've added the bench result and script in the PR description, given that codespeed seem to recommend divan instead of criterion, maybe I can drop the bench script from the PR code and we can revisit when we add codespeed ? |
This is mostly me babbling around with criterion but is a 2-5x speed improvement on the
slugifyutil by avoiding regexes and iterating over the characters.i had to enable
crate-type = [..., "rlib"]to be able to import the function in the benchmark script.This has a minimal cost on dev builds and is ignored in the pyo3 wheel so probably fine.
You can run
cargo bench --bench slugify_benchto try it. The table summary was generated by passing the raw output to claude. There are also builtin html report but I wanted something more minimalBenchmark results
Benchmark script