Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement 5 - Cache compiled regexes #995

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

JCWasmx86
Copy link
Contributor

Even though the regex module has a cache, it's access is not that fast. E.g. re.sub is a combination of re._compile and pattern.sub. re._compile is checking the flags for e.g. DEBUG values or the verbosity. It uses an enum for the flags. enum.and is quite slow, so even for a cache hit we have around two to three enum.and calls. This patch caches commonly used regexes. The naming probably has to be adjusted.

Commonly used is defined as: pattern._compile is hit with this pattern+flags combination more than 1000 times in the netbox repo. That somewhat balances out the time needed for compilation, the runtime speed and the maintenance effort.

Before:
image

regex._compile is called 392.678x times, taking 40% of the time
enum.and is called 1.038.531x times, taking 20% of the time

After:

image

regex._compile is called 7226x times, taking 20% of the time
enum.and is called 267.691x times, taking 8-9% of the time. (That's still a lot)

This can probably improved for more regexes, but I think it's somewhat balanced at the point. The compilation time on my PC for the regexes is at around 0.1s

Timings:
Netbox, parallel: 2.5-2.8s
EDX Platform, parallel: 19s

This is the final patch of the performance improvement patch series. There are probably more improvements, but they are not that low-hanging fruits like the last 6 patches.

@JCWasmx86 JCWasmx86 changed the title perf: Cache compiled regexes Performance Improvement 5 - Cache compiled regexes Nov 1, 2024
Copy link

netlify bot commented Nov 1, 2024

Deploy Preview for djlint ready!

Name Link
🔨 Latest commit 9139b22
🔍 Latest deploy log https://app.netlify.com/sites/djlint/deploys/6773c95cb88d11000861033f
😎 Deploy Preview https://deploy-preview-995--djlint.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

djlint/settings.py Outdated Show resolved Hide resolved
@oliverhaas
Copy link
Contributor

oliverhaas commented Nov 2, 2024

Hi. I came here mainly to apologize that I broke your branch with my PR. Sorry for that!
I appreciate the work you've put into djlint in your last PRs; I just ran some quick tests earlier and you got some really good numbers out of djlint!

I'm not sure if I understand all the changes, but I naively thought that your approach seems a little bit complicated, and I was gonna suggest to just use lru_cache (with what I assumed was a small performance penalty), basically like this:

# Call this module something like `regex.py` or `regex_custom_wrappers.py` 

import re
from functools import lru_cache


def search(regex, text, use_cache: bool = True, flags=None, **kwargs):
    if use_cache:
        re_compiled = _compile_cached(regex, flags=flags)
        return re_compiled.search(text, **kwargs)
    return re.search(regex, text, flags=flags, **kwargs)

# ... more regex functions

@lru_cache(maxsize=256)
def _compile_cached(regex, flags=None) -> re.Pattern:
    return re.compile(regex, flags=flags)

and then it's basically just find & replace. I've got about 20% faster formatting, I think, but I didn't want to spend too much time before I check in with you.
Do you think your approach is worth explicitly storing the compiled regexes? Maybe it's nicer to be more explicit, but I'm quite fond of lru_cache, so just let me know.

@JCWasmx86
Copy link
Contributor Author

JCWasmx86 commented Nov 2, 2024

Hi. I came here mainly to apologize that I broke your branch with my PR. Sorry for that!

No worries :)

I appreciate the work you've put into djlint in your last PRs; I just ran some quick tests earlier and you got some really good numbers out of djlint!

Thanks, I hope I made your workflows faster :p

I'm not sure if I understand all the changes, but I naively thought that your approach seems a little bit complicated, and I was gonna suggest to just use lru_cache (with what I assumed was a small performance penalty), basically like this:

I have an even better idea combining both approaches:

import regex as re

def search(regex: ???, text: str, flags:RegexFlags|None=None, **kwargs):
  return _compile_cached(regex, flags=flags).search(text, **kwargs)


--- Somewhere
old_search = re.search
re.search = re_search
format_it()
re.search = old_search

(Just the signatures have to match). I would make the cache as big as possible. (So, I think @cache?) If we monkeypatch the regex module, it should be quite trivial and shouldn't matter, because djLint is foremost a CLI tool and not a library so it's legal to do that

I have no clear preference, I just did it like this because I was first like:

  • Ok, this regex is used a lot (21k+ times) => Let's cache it
  • Ok, this regex is used a lot => Let's cache it

And so on.

So we have three things we could do:

  • Stay with the current code
  • Use your approach
  • Use the approach I proposed here

That's probably on monosans to decide :)

@JCWasmx86
Copy link
Contributor Author

After thinking a bit more, I think your solution @oliverhaas is superior in every manner. I've implemented it like you suggested (I added you as co-author as you made substantial improvements)

@JCWasmx86
Copy link
Contributor Author

I've added a few extra commits that optimize stuff that wasn't visible earlier. Based on the results of edx-platform:

clear&&git stash; time djlint . --reformat --lint >/dev/null 2>&1
1.35.2:
real	3m16.714s
user	8m42.154s
sys	0m0.378s
1.35.3:
real	0m21.490s
user	1m54.587s
sys	0m0.701s
1.35.4:
real	0m15.429s
user	1m38.236s
sys	0m0.875s
HEAD+patch:
real	0m11.467s
user	0m51.634s
sys	0m0.659s

Performance

@oliverhaas
Copy link
Contributor

oliverhaas commented Nov 2, 2024

(Ignore my previous comment if you had seen it)

Here some stuff I've tried (average of 10 runs reformatting edx-platform):

  • Current master: 7.80s
  • This PR: 5.81s
  • Use re module instead of regex: 4.89s
  • Remove "our" manual caching/@cache: 4.90s
  • Revert using .compile() and just let re do the caching: 4.75s

So as far as I can see it would probably the best to use re and revert most of the manual caching, but let me know if I missed something or if you see different behavior. I would be happy to take care of a PR as well :).

@JCWasmx86
Copy link
Contributor Author

JCWasmx86 commented Nov 2, 2024

It seems there is an issue with the re module: (BTW how many cores do you have? 6 cores /12 threads?)

concurrent.futures.process._RemoteTraceback:                                                                                                                                                                                                  
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.12/concurrent/futures/process.py", line 263, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/__init__.py", line 459, in process
    output["format_message"] = reformat_file(config, this_file)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/reformat.py", line 64, in reformat_file
    beautified_code = formatter(config, rawcode)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/reformat.py", line 32, in formatter
    expanded = expand_html(compressed, config)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/formatter/expand.py", line 62, in expand_html
    html = regex_utils.sub(
           ^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/regex_utils.py", line 35, in sub
    return _compile_cached(regex, flags=flags).sub(repl, text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/regex_utils.py", line 46, in _compile_cached
    return re_.compile(regex, flags=flags or 0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/__init__.py", line 228, in compile
    return _compile(pattern, flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/__init__.py", line 307, in _compile
    p = _compiler.compile(pattern, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_compiler.py", line 745, in compile
    p = _parser.parse(p, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 979, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 460, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 544, in _parse
    code = _escape(source, this, state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 443, in _escape
    raise source.error("bad escape %s" % escape, len(escape))
re.error: bad escape \K at position 14 (line 1, column 15)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.local/bin/djlint", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/__init__.py", line 414, in main
    file_errors.append(future.result())
                       ^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
re.error: bad escape \K at position 14 (line 1, column 15)

Or if I "fix" it with ChatGPT (My regex skills are basic):
re.error: look-behind requires fixed-width pattern

You probably would have to port those regexes that won't work. And if I understand correctly, re is old stuff, regex is new stuff, so I'm not sure how much sense it makes to use an older library with less features. Sure more performance (If somebody can get it working) is nice, but going back, that depends on the opinions of the maintainers. Furthermore has regex "more thorough Unicode support" (To quote https://docs.python.org/3/library/re.html), so I think re may not make sense if it breaks some unicode stuff

But I think the best way would be to first merge this PR and then build stuff upon it (imo)

@oliverhaas
Copy link
Contributor

Weird, I did not get an error, but I only reformatted the edx-platform repo on one system and haven't tested anything else.

For me it's hard to keep track of whether regex is really still the "new stuff", or whether the important stuff got ported to re, especially since both have existed for so long and are (basically?) fully compatible.

From me it's definitely a thumbs up for merging.

(Off-topic: I just got a 16-core/32-threads CPU for my old retired workstation for fairly cheap. Haven't benchmarked djlint specifically, but compiling or running tests is literally more than 4 times faster compared to my 4-core laptop, which actually is noticeably making my workflows easier.)

@monosans
Copy link
Member

monosans commented Nov 3, 2024

We will not replace regex with re, as re lacks some of the features of regex and this would be a breaking change for those who create their own rules in djlint_rules.yaml.

@JCWasmx86
Copy link
Contributor Author

@oliverhaas If you want do do more optimizations, there are maybe a few things worth looking into it (And out of scope of this PR):

  • Modify child_of_unformatted_block to be smarter. After this patch is applied, it takes 40+% of the entire runtime. Maybe you could do fancy stuff with interval trees or bisection.
  • Check whether replacing (x.start(0), x.end()) for x in matches by x.span() could make sense (You could probably reduce up to 2-3%, if you really want to do micro-optimizations)

@monosans
Copy link
Member

monosans commented Nov 5, 2024

Hey, @JCWasmx86! There are some conflicts because of the changes I've made, sorry. Could you please resolve them and do some profiling to see how much this PR improves the performance now? Thanks!

@JCWasmx86
Copy link
Contributor Author

There are some conflicts because of the changes I've made, sorry. Could you please resolve them

I've resolved them with a lot of hard work :/, sadly the git history had to suffer for that

HEAD: (4 runs summed up)

real   0m49.220s => 12.305s/run
user   4m15.192s => 64s/run
sys    0m3.022s

This PR: (4 runs summed up)

real   0m42.939s => 10.735s/run
user   3m25.135s => 51.28375s/run
sys    0m2.984s

So there are still improvements

@JCWasmx86
Copy link
Contributor Author

Hey @monosans, what is needed for this PR to get merged? Can I assist in any kind?

@monosans
Copy link
Member

monosans commented Nov 7, 2024

Hey @monosans, what is needed for this PR to get merged? Can I assist in any kind?

Please see review comments

@JCWasmx86
Copy link
Contributor Author

@monosans I'm sorry there are none. Did you maybe forget to finalize the review?

djlint/helpers.py Outdated Show resolved Hide resolved
djlint/regex_utils.py Show resolved Hide resolved
djlint/helpers.py Outdated Show resolved Hide resolved
djlint/lint.py Outdated Show resolved Hide resolved
@@ -118,6 +121,7 @@ def linter(
"match": match.group().strip()[:20],
"message": rule["message"],
})
build_flags.cache_clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one quick question, why was this added? flags are just constants so the cache shouldn't grow that big

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monosans Could you please check this comment? Furthermore I've rebased the this PR

@oliverhaas
Copy link
Contributor

@JCWasmx86 Thanks again for the hard work. I'm at 1.2s for edx-platform with this branch on my desktop, which is crazy to think about where djlint was just a while ago. I basically only have one feature/bug left on my pain points, and just a month ago I was looking for alternatives to djlint...

If you could share your profiling script in a gist or something, that would be awesome. I can't seem to get the profiler output quite as readable as your images.

@JCWasmx86
Copy link
Contributor Author

JCWasmx86 commented Nov 10, 2024

@oliverhaas I just used gprof2dot and snakeviz and temporarily modified the as code as described here: #986 (comment) (Parallel execution => Serial execution as otherwise the profiler gives stupid results)

I've profiled again with clear&&git stash; time hyperfine -i --warmup 3 --runs 10 'djlint . --lint --reformat' --output null for edx-platform

master:
Benchmark 1: djlint . --lint --reformat
  Time (mean ± σ):      5.886 s ±  0.037 s    [User: 37.871 s, System: 0.475 s]
  Range (min … max):    5.856 s …  5.981 s    10 runs
 
  Warning: Ignoring non-zero exit code.
 

real	1m16.701s
user	8m13.952s
sys	0m6.147s

PR:
Benchmark 1: djlint . --lint --reformat
  Time (mean ± σ):      4.450 s ±  0.178 s    [User: 28.246 s, System: 0.435 s]
  Range (min … max):    4.182 s …  4.587 s    10 runs
 
  Warning: Ignoring non-zero exit code.
 

real	0m57.471s
user	6m7.896s
sys	0m5.683s

Even though the regex module has a cache, it's access is not that fast.
E.g. re.sub is a combination of re._compile and pattern.sub. re._compile
is checking the flags for e.g. DEBUG values or the verbosity. It uses
an enum for the flags. enum.__and__ is quite slow, so even for a cache hit
we have around two to three enum.__and__ calls. This patch caches commonly used
regexes. The naming probably has to be adjusted.

Commonly used is defined as: pattern._compile is hit with this pattern+flags
combination more than 1000 times in the netbox repo. That somewhat balances out
the time needed for compilation, the runtime speed and the maintenance effort.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants