Skip to content

Conversation

pfultz2
Copy link
Collaborator

@pfultz2 pfultz2 commented Sep 24, 2025

Motivation

Refactor find_matches to help improve performance by moving many things outside of the loop.

Technical Details

The find_matches function would do many things inside of the loop even when tracing is not enabled:

  • Checking if the trace filter matches
  • Run time on the apply function
  • Call to matcher() to construct the matcher, which is usually a no-op but in some cases we are constructing maps, etc, so we can benefit from constructing it once

This also separates the checking with and without tracing enabled, so the tracing wont interfere with optimizations the compiler might do.

Changelog Category

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

@pfultz2 pfultz2 requested a review from causten as a code owner September 24, 2025 16:03
@causten causten changed the title Streamline find_matches funtion Streamline find_matches function Sep 24, 2025
@causten causten requested a review from Copilot September 24, 2025 16:24
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the find_matches function to improve performance by moving computationally expensive operations outside of the main matching loop. The key optimization separates tracing-enabled and tracing-disabled execution paths to prevent tracing overhead from interfering with compiler optimizations.

  • Extracted match runner creation logic to avoid repeated construction of matchers and environment variable checks
  • Separated tracing and non-tracing execution paths to enable better compiler optimization
  • Moved environment variable evaluations and trace filter checks outside the loop

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 34.92063% with 41 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/include/migraphx/matcher.hpp 34.92% 41 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4316      +/-   ##
===========================================
- Coverage    92.23%   92.14%   -0.09%     
===========================================
  Files          557      557              
  Lines        25924    25945      +21     
===========================================
- Hits         23909    23905       -4     
- Misses        2015     2040      +25     
Files with missing lines Coverage Δ
src/include/migraphx/matcher.hpp 87.20% <34.92%> (-7.08%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
6000a7
Rate old
38fdc6
Diff Compare
torchvision-resnet50 64 3,156.85 3,173.83 -0.54%
torchvision-resnet50_fp16 64 6,594.09 6,613.50 -0.29%
torchvision-densenet121 32 2,437.24 2,445.12 -0.32%
torchvision-densenet121_fp16 32 4,115.10 4,132.06 -0.41%
torchvision-inceptionv3 32 1,665.76 1,673.47 -0.46%
torchvision-inceptionv3_fp16 32 2,584.21 2,596.39 -0.47%
cadene-inceptionv4 16 795.80 797.27 -0.19%
cadene-resnext64x4 16 802.54 806.10 -0.44%
slim-mobilenet 64 8,212.72 8,232.04 -0.23%
slim-nasnetalarge 64 221.77 222.86 -0.49%
slim-resnet50v2 64 3,296.95 3,305.41 -0.26%
bert-mrpc-onnx 8 1,135.68 1,144.06 -0.73%
bert-mrpc-tf 1 487.56 486.42 0.24%
pytorch-examples-wlang-gru 1 318.36 309.90 2.73%
pytorch-examples-wlang-lstm 1 432.72 387.21 11.75% 🔆
torchvision-resnet50_1 1 808.46 745.35 8.47% 🔆
cadene-dpn92_1 1 427.85 428.94 -0.25%
cadene-resnext101_1 1 368.14 369.63 -0.40%
onnx-taau-downsample 1 397.97 399.22 -0.31%
dlrm-criteoterabyte 1 31.92 32.03 -0.35%
dlrm-criteoterabyte_fp16 1 51.02 51.11 -0.18%
agentmodel 1 9,854.45 9,659.41 2.02%
unet_fp16 2 59.05 59.19 -0.24%
resnet50v1_fp16 1 991.47 991.39 0.01%
resnet50v1_int8 1 990.03 971.32 1.93%
bert_base_cased_fp16 64 1,098.63 1,104.24 -0.51%
bert_large_uncased_fp16 32 343.76 345.64 -0.54%
bert_large_fp16 1 197.69 198.02 -0.17%
distilgpt2_fp16 16 2,076.51 2,085.17 -0.42%
yolov5s 1 588.64 588.82 -0.03%
tinyllama 1 43.82 43.95 -0.31%
vicuna-fastchat 1 45.12 45.27 -0.34%
whisper-tiny-encoder 1 410.05 410.98 -0.23%
whisper-tiny-decoder 1 414.55 415.37 -0.20%
llama2_7b 1 19.11 19.15 -0.21%
qwen1.5-7b 1 23.45 23.53 -0.35%
phi3-3.8b 1 26.63 26.70 -0.26%
mask-rcnn 1 12.17 12.24 -0.57%
llama3-8b 1 21.64 21.74 -0.48%
whisper-large-encoder 1 10.17 10.22 -0.47%
whisper-large-decoder 1 99.49 99.83 -0.35%
mistral-7b 1 23.66 23.74 -0.32%
FLUX.1-schnell 1 714.67 721.05 -0.88%
nan nan nan nan nan%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error output2025-09-24 19:07:17.811682: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')


     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

     ✅ llama2_7b: PASSED: MIGraphX meets tolerance

     ✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

     ✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ llama3-8b: PASSED: MIGraphX meets tolerance

     ✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

     ✅ mistral-7b: PASSED: MIGraphX meets tolerance

     ✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants