Skip to content

fix - Change flops calculation logic#30

Merged
Chamberlain0w0 merged 1 commit intomasterfrom
fix/flops_calculation
Mar 10, 2026
Merged

fix - Change flops calculation logic#30
Chamberlain0w0 merged 1 commit intomasterfrom
fix/flops_calculation

Conversation

@baominghelly
Copy link
Collaborator

Description

Change flops calculation logic - add average latency calculation.

Test evidence

Op test succeeds.

python main.py format_input_matmul.json 
2026-03-10 17:01:09,503 - infinimetrics.utils.input_loader - INFO - Loaded 1 input(s) from format_input_matmul.json
2026-03-10 17:01:09,503 - infinimetrics.dispatcher - INFO - Processing 1 valid inputs (skipped 0 invalid)
2026-03-10 17:01:12,219 - infinimetrics.dispatcher - INFO - Validation complete: 1 valid, 0 skipped
2026-03-10 17:01:12,219 - infinimetrics.dispatcher - INFO - [1/1] Executing operator.InfiniCore.Matmul.Performance
2026-03-10 17:01:12,219 - infinimetrics.executor - INFO - Executor: Running operator.InfiniCore.Matmul.Performance
2026-03-10 17:01:12,219 - infinimetrics.operators.infinicore_adapter - INFO - InfiniCoreAdapter: Processing operator.InfiniCore.Matmul.Performance
🚀 Mode: Dynamic Execution
InfiniCore Operator Test Runner
Directory: Dynamic (1 cases)
Tests found: 1

✅  test_Matmul_0: PASSED (code: 0)

============================================================
Testing Matmul on NVIDIA
============================================================
TestCase(Auto-Gen: Matmul - inputs=[in_0: tensor(16, 4096, 4096), float16; in_1: tensor(16, 4096, 4096), float16], kwargs={out=output: tensor(16, 4096, 4096), float16})
    PyTorch    time - Host: 32.191522 ms, Device: 35.200121 ms
    InfiniCore time - Host: 32.398386 ms, Device: 34.822636 ms
✓ Passed

============================================================
TEST SUMMARY
Total tests: 1
Passed: 1
Success rate: 100.0%

All tests passed!
------------------------------------------------------------
BENCHMARK SUMMARY
PyTorch Host Total Time: 3219.152 ms
PyTorch Device Total Time: 3520.012 ms
InfiniCore Host Total Time: 3239.839 ms
InfiniCore Device Total Time: 3482.264 ms
Host Speedup (PyTorch/InfiniCore): 0.99x
Device Speedup (PyTorch/InfiniCore): 1.01x
============================================================
💾 Saving to: test_report_20260310_170129_863.json
   ✅ Saved.
----------------------------------------

================================================================================
CUMULATIVE TEST SUMMARY
================================================================================
Total tests run: 1
Passed: 1
Failed: 0
----------------------------------------
BENCHMARK SUMMARY (1 cases):
----------------------------------------

✅ PASSED OPERATORS (1):
  test_Matmul_0

Success rate: 100.0%

🎉 All tests passed!
2026-03-10 17:01:30,510 - infinimetrics.executor - INFO - Executor: operator.InfiniCore.Matmul.Performance completed with code=0
2026-03-10 17:01:30,512 - infinimetrics.dispatcher - INFO - Summary saved to summary_output/dispatcher_summary_20260310_170130.json

============================================================
Test Summary
============================================================
Total tests:   1
Successful:    1
Failed:        0
Success rate:  100.0%
============================================================

Test output

{
  "run_id": "test.matmul.perf.2048",
  "time": "2026-03-10 17:01:29",
  "testcase": "operator.InfiniCore.Matmul.Performance",
  "environment": {
    "cluster_scale": 1,
    "topology": "0x1 ring mesh",
    "cluster": [
      {
        "machine": {
          "cpu_model": "13th Gen Intel(R) Core(TM) i7-13700F",
          "memory_gb": 7,
          "accelerators": [
            {
              "model": "NVIDIA GeForce RTX 4070",
              "count": 0,
              "memory_gb_per_card": 11,
              "driver": "591.55",
              "cuda": "12.1",
              "type": "nvidia"
            }
          ]
        },
        "framework": [
          {
            "name": "unknown",
            "version": "unknown"
          }
        ]
      }
    ]
  },
  "result_code": 0,
  "config": {
    "operator": "matmul",
    "device": "nvidia",
    "torch_op": "torch.matmul",
    "infinicore_op": "infinicore.matmul",
    "inputs": [
      {
        "name": "in_0",
        "shape": [
          16,
          4096,
          4096
        ],
        "dtype": "float16"
      },
      {
        "name": "in_1",
        "shape": [
          16,
          4096,
          4096
        ],
        "dtype": "float16"
      }
    ],
    "attributes": [],
    "outputs": [
      {
        "name": "output",
        "shape": [
          16,
          4096,
          4096
        ],
        "dtype": "float16"
      }
    ],
    "warmup_iterations": 10,
    "measured_iterations": 100,
    "tolerance": {
      "atol": 0.001,
      "rtol": 0.001
    },
    "output_dir": "./output",
    "_testcase": "operator.InfiniCore.Matmul.Performance",
    "_run_id": "test.matmul.perf.2048",
    "_time": null
  },
  "metrics": [
    {
      "name": "operator.latency",
      "value": 34.822636,
      "type": "scalar",
      "raw_data_url": "",
      "unit": "ms"
    },
    {
      "name": "operator.tensor_accuracy",
      "value": "PASS",
      "unit": ""
    },
    {
      "name": "operator.flops",
      "value": 63.1492,
      "type": "scalar",
      "raw_data_url": "",
      "unit": "TFLOPS"
    },
    {
      "name": "operator.bandwidth",
      "value": 46.2519,
      "type": "scalar",
      "raw_data_url": "",
      "unit": "GB/s"
    }
  ]
}

@baominghelly baominghelly self-assigned this Mar 10, 2026
@Chamberlain0w0 Chamberlain0w0 merged commit af857b1 into master Mar 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants