-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(issues): Improve FileIOMainThreadDetector performance #82903
base: master
Are you sure you want to change the base?
Conversation
FileIOMainThreadDetector is our most time consuming detector: https://sentry.sentry.io/traces/?groupBy=span.description&interval=15m&mode=aggregate&project=1&query=span.op%3Afunction%20span.description%3Arun_detector_on_data.%2A&statsPeriod=24h&visualize=%7B%22chartType%22%3A1%2C%22yAxes%22%3A%5B%22sum%28span.duration%29%22%5D%7D This eliminates the need to recompile these glob patterns potentially for every transaction event and adds a synthetic benchmark to measure the (~4x) improvement: Before: ``` $ ./bin/benchmark_detectors 1,000,000 ops 7.005 s 142,758.90 ops/s ``` After: ``` $ ./bin/benchmark_detectors 1,000,000 ops 1.806 s 553,602.94 ops/s ```
def main(): | ||
settings = get_detection_settings() | ||
|
||
# 10 events: 1 ignored, 1 matching, and 8 ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all came directly from tests/sentry/utils/performance_issues/test_file_io_on_main_thread_detector.py
@@ -117,7 +118,10 @@ class FileIOMainThreadDetector(BaseIOMainThreadDetector): | |||
|
|||
__slots__ = ("stored_problems",) | |||
|
|||
IGNORED_LIST = {"*.nib", "*.plist", "*kblayout_iphone.dat"} | |||
IGNORED_LIST = { | |||
re.compile(fnmatch.translate(pattern)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sentry.utils.glob.glob_match
is documented as "A beefed up version of fnmatch.fnmatch" but would appreciate any insights into ways they might diverge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix! Since you have the benchmark, it could be worth seeing if compiling into a single regex would be faster, like
"|".join(fnmatch.translate(pattern) for pattern in patterns)
Alternatively, these are just suffix matches on files, so would it be even faster to just do
IGNORED_SUFFIX_LIST = [".nib", ".plist", "kblayout_iphone.dat"]
file_path.endswith(suffix) for suffix in IGNORED_SUFFIX_LIST)
```?
This is ~10-15% faster and more importantly it's a lot simpler to understand and reason about. Before (with pre-compiled multiple regexes) ``` $ ./bin/benchmark_detectors 1,000,000 ops 1.782 s 561,138.73 ops/s ``` After ``` $ ./bin/benchmark_detectors 1,000,000 ops 1.556 s 642,830.58 ops/s ```
Great idea - we're probably reaching the limits of the accuracy of this benchmark but I do like the idea of moving to a suffix check since it's also way simpler to reason about: Pre-compiled multiple regexes:
Pre-compiled single regex:
File path ends with:
|
❌ 3 Tests Failed:
View the top 3 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
FileIOMainThreadDetector is our most time consuming detector: https://sentry.sentry.io/traces/?groupBy=span.description&interval=15m&mode=aggregate&project=1&query=span.op%3Afunction%20span.description%3Arun_detector_on_data.%2A&statsPeriod=24h&visualize=%7B%22chartType%22%3A1%2C%22yAxes%22%3A%5B%22sum%28span.duration%29%22%2C%22count%28span.duration%29%22%2C%22p95%28span.duration%29%22%2C%22p99%28span.duration%29%22%5D%7D
This eliminates the need to recompile these glob patterns potentially for every transaction event and adds a synthetic benchmark to measure the (~4x) improvement:
Before:
After: