This directory contains the gas benchmarking infrastructure for Remitwise smart contracts. The system tracks CPU and memory costs for critical operations to detect performance regressions early in development.
Gas benchmarking helps ensure that contract operations remain efficient and predictable. Each benchmark measures:
- CPU Instructions: Computational cost of operations
- Memory Usage: Storage and temporary memory allocation costs
benchmarks/
├── README.md # This documentation
├── baseline.json # Baseline measurements for all operations
├── thresholds.json # Regression detection thresholds
└── history/ # Historical benchmark data
Contains baseline CPU and memory costs for each benchmarked operation. These values are updated when legitimate performance improvements are made.
Defines regression detection thresholds as percentage increases from baseline:
- default: 10% increase triggers warning for most operations
- contract_specific: Custom thresholds per contract
- method_specific: Custom thresholds per method
# Run remittance_split schedule operation benchmarks
RUST_TEST_THREADS=1 cargo test -p remittance_split --test gas_bench -- --nocapture
# Run bill_payments benchmarks
RUST_TEST_THREADS=1 cargo test -p bill_payments --test gas_bench -- --nocapture
# Run reporting aggregation benchmarks
RUST_TEST_THREADS=1 cargo test -p reporting --test gas_bench -- --nocapture# Run all gas benchmarks across contracts
./scripts/run_all_benchmarks.shEach benchmark outputs JSON with the following structure:
{
"contract": "remittance_split",
"method": "create_remittance_schedule",
"scenario": "single_recurring_schedule",
"cpu": 12345,
"mem": 6789
}For CI parsing, gas suites may also emit lines prefixed with:
GAS_BENCH_RESULT: machine-readable benchmark result with baseline/threshold metadatacpu regression .../mem regression ...: assertion failures when thresholds are exceeded
This keeps --nocapture logs easy to scrape in CI while preserving normal Rust test output.
The remittance split contract includes comprehensive benchmarks for schedule lifecycle operations:
create_remittance_schedule/single_recurring_schedule: Basic schedule creationcreate_remittance_schedule/11th_schedule_with_existing: Scaling with existing schedules
modify_remittance_schedule/single_schedule_modification: Update existing schedule
cancel_remittance_schedule/single_schedule_cancellation: Cancel active schedule
get_remittance_schedules/empty_schedules: Query with no schedulesget_remittance_schedules/5_schedules_with_isolation: Query with data isolationget_remittance_schedules/50_schedules_worst_case: Worst-case query performanceget_remittance_schedule/single_schedule_lookup: Single schedule retrieval
All benchmarks include security validations:
- Authorization: Tests verify proper authentication and authorization
- Data Isolation: Ensures users can only access their own data
- Input Validation: Tests with valid parameters to ensure proper validation
- Edge Cases: Covers boundary conditions and error scenarios
bill_payments/tests/gas_bench.rs includes dedicated regression coverage for:
archive_paid_bills/120_paid_1_unpaid_preservedrestore_bill/single_archived_owner_restorebulk_cleanup_bills/mixed_age_20_of_30_deletedbatch_pay_bills/mixed_batch_50_partial_success
Security assumptions validated in these benches:
- Archive and cleanup are maintenance operations over paid/archived data only
- Restore is owner-only
- Batch pay preserves owner isolation and deterministic partial success
- Oversized batches are rejected (
BatchTooLarge)
The system automatically detects regressions by comparing current measurements against baselines:
- Green: Within threshold (no action needed)
- Yellow: Exceeds threshold but < 25% increase (review recommended)
- Red: > 25% increase (investigation required)
The reporting contract benchmarks cover the three heavy aggregation paths identified in issue #317, each run at three data sizes (small/medium/large) to expose O(n) complexity growth.
| Scenario | Description |
|---|---|
no_addresses_baseline |
Addresses not configured – O(1) storage miss, returns Missing |
with_split_4_categories |
Two cross-contract calls + four-category breakdown loop |
| Scenario | Items | Windows |
|---|---|---|
5_periods |
5 | 4 |
25_periods |
25 | 24 |
50_periods |
50 | 49 |
Pure in-contract computation; no cross-contract calls. Scales linearly with history length.
| Scenario | Goals | Bills | Policies |
|---|---|---|---|
small_5_items |
5 | 5 | 5 |
medium_25_items |
25 | 25 | 25 |
large_50_items |
50 | 50 | 50 |
Issues nine cross-contract calls per invocation:
get_all_goals ×2, get_unpaid_bills ×1, get_active_policies ×2,
get_split ×1, calculate_split ×1, get_all_bills_for_owner ×1,
get_total_monthly_premium ×1.
| Scenario | Stored reports |
|---|---|
5_stored_reports |
5 |
25_stored_reports |
25 |
50_stored_reports |
50 |
Dual O(n) map iteration: first over REPORTS to find candidates, then over
to_remove to delete them.
after_25_archived – O(1) single instance-storage key read. Used to confirm
the stats endpoint stays flat regardless of archive depth.
When adding new contract methods:
- Create benchmark test in
contracts/{contract}/tests/gas_bench.rs - Add baseline entry in
baseline.json - Set thresholds in
thresholds.jsonif non-standard - Document security assumptions in test comments
/// Benchmark: {Operation description}
/// Security: {Security validations performed}
#[test]
fn bench_{operation_name}() {
let env = bench_env();
let contract_id = env.register_contract(None, YourContract);
let client = YourContractClient::new(&env, &contract_id);
// Setup test data
let owner = <Address as AddressTrait>::generate(&env);
let (cpu, mem, result) = measure(&env, || {
client.your_method(&owner, ¶m1, ¶m2)
});
// Validate result
assert!(result.is_ok());
println!(
r#"{{"contract":"your_contract","method":"your_method","scenario":"test_scenario","cpu":{},"mem":{}}}"#,
cpu, mem
);
}- Consistent Environment: Use
bench_env()for reproducible conditions - Realistic Data: Test with representative data sizes and patterns
- Worst-Case Scenarios: Include stress tests with maximum realistic loads
- Security Validation: Always verify security assumptions in benchmarks
- Clear Naming: Use descriptive scenario names that indicate test conditions
- Benchmark results are tracked in CI/CD pipelines
- Significant regressions trigger build failures
- Historical data enables trend analysis
- Performance improvements can be validated before deployment
- Ensure
RUST_TEST_THREADS=1for consistent execution - Check for external factors affecting test environment
- Verify test data setup is deterministic
- Review recent code changes for performance impacts
- Check if test scenarios still match actual usage patterns
- Validate that baseline measurements are still accurate
Some operations may have inherently higher variance:
- Iteration-heavy operations (higher CPU threshold)
- Dynamic memory allocation (higher memory threshold)
- Complex calculations (higher CPU threshold)
Update thresholds.json with appropriate values based on operation characteristics.