Skip to content

CLI tool for aggregating JUnit report times so you can split tests on CI πŸ”§

License

Notifications You must be signed in to change notification settings

willgeorgetaylor/junit-reducer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

76 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

junit-reducer

GitHub CI codecov Language GitHub release Go Report Card

JUnit Reducer is a command-line tool that aggregates the JUnit test XML reports from your CI runs and reduces them to a single, lighter set of reports to be downloaded later during CI, to steer your test splitting algorithm (e.g., split_tests). The most typical use case is to regularly update a 'running average' of your recent test reports, which can be downloaded to your test runners in less time and without running an ongoing race condition risk.

Diagram explaining how junit-reducer turns multiple sets of JUnit reports into a single set of JUnit reports.

Quickstart

Typically, you'll be using junit-reducer within a scheduled cron task to take the last X days of JUnit XML reports, reduce them and upload the results. It's recommended to accumulate the JUnit XML reports from individual CI runs in a cloud storage service like AWS S3 or Google Cloud Storage, as opposed to the caching APIs available from the CI providers (GitHub Actions, CircleCI etc.) themselves, which are designed as overwrite key-value stores.

GitHub Actions

Tip

If you're using GitHub Actions, check out the Action on GitHub Marketplace, if you prefer that sort of thing.

name: junit-test-report-averaging
run-name: Create Average JUnit Test Reports
on:
  schedule:
      # Run every morning at 8AM
      - cron:  '0 8 * * *'
jobs:
  reduce-reports:
    runs-on: ubuntu-latest
    steps:
      # Configure with the Cloud storage provider of your choice.
      - name: Setup AWS CLI
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.YOUR_AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.YOUR_AWS_SECRET_ACCESS_KEY }}
          aws-region: eu-west-2

      # Download all test reports from all CI runs.
      # It is recommended to set up a lifecycle rule, to remove objects older
      # than a certain age from this bucket/path. This will help to keep the test reports
      # current and keep this job from taking too long.
      - name: Download test timings
        run: |
          aws s3 cp s3://your-junit-report-bucket/ci-runs-reports/ reports/ \
            --recursive

      # Extract the binary for your target environment (assumed to be Linux). See full list
      # of releases here: https://github.com/willgeorgetaylor/junit-reducer/releases
      - name: Reduce reports
        run: |
          curl -L "https://github.com/willgeorgetaylor/junit-reducer/releases/latest/download/junit-reducer_Linux_x86_64.tar.gz" | tar -xzf -
          chmod +x junit-reducer
          ./junit-reducer \
            --include="./reports/**/*.xml" \
            --output-path="./average-reports/"

      # Upload the reduced set of test reports to a dedicated bucket/path.
      # In your actual CI process, the CI runners will copy the contents of
      # this path locally, to be ingested by the test splitter.
      - name: Upload single set of averaged reports
        run: |
          aws s3 sync ./average-reports s3://your-junit-report-bucket/average-reports/ \
            --size-only \
            --cache-control max-age=86400

Why?

As your test suite grows, you may want to start splitting tests between multiple test runners, to be executed concurrently. While it's relatively simple to divide up your test suites by files, using lines of code (LOC) as a proxy for test duration, the LOC metric is still just an approximation and will result in uneven individual (and therefore overall slower) test run times as your codebase and test suites change.

The preferable approach for splitting test suites accurately is to use recently reported test times, and the most popular format for exchanging test data (including timings) between tools is the JUnit XML reports format. While JUnit itself is a Java project, the schema that defines JUnit reports is equally applicable to any language and reports can be generated by most testing frameworks for JavaScript, Ruby, Python etc.

In busier projects, CI will be uploading reports frequently, so even if you take a small time window (for example, the last 24 hours), you could end up with 20MB+ of test reports. These reports need to be downloaded to every runner in your concurrency set, only to then perform the same splitting operation to yield the exact same time estimates. This means unnecessary and expensive work is being performed by each concurrent runner, potentially delaying the total test time by minutes and increasing CI costs.

Faster CI βœ…

You can solve this speed issue with junit-reducer by creating a set of reports that looks like the set produced by a single CI run. Importantly, the values for time taken by test suite (as well as other counts, like errors and tests) are reduced from the wider set of reports, typically by finding the mean of all of the aggregate time values. Other reducer operations, like min / max / mode / median / sum, are available to handle non-standard distributions.

Coverage integrity βœ…

In very busy projects, there is also a more problematic race condition possible, with larger downloads and test runners starting at different times. As CI runs from other commits upload their reports to the same remote source that you're downloading from, if any of your concurrent runners download reports with different values, the input data is misaligned and the splitting operation is corrupted. However, because the download and splitting operation is being performed in a distributed manner (across all of the runners concurrently) this misalignment will result in some tests in your run being skipped.

This risk is mitigated by computing the averaged reports in one place, and updating that set as part of a scheduled job. This is exactly the approach outlined in the quickstart section.

Usage

Download and extract the latest build for your target environment, from the releases page.

For a complete list of arguments:

$./junit-reducer --help
Flags:
  -h, --help                          help for junit-reducer
      --include string                Glob pattern to find JUnit XML reports to reduce (default "./**/*.xml")
      --output-path string            Output path for the reduced JUnit XML reports (default "./output/")
      --exclude string                Glob pattern to omit from included JUnit XML reports
      --op-cases-time string          Reducer operation for test case time values. Options: "max", "mean", "median", "min", "mode" or "sum" (default "mean")
      --op-suites-time string         Reducer operation for test suite time values. Options: "max", "mean", "median", "min", "mode" or "sum" (default "mean")
      --op-suites-assertions string   Reducer operation for test suite assertion counts. Options: "max", "mean", "median", "min", "mode" or "sum" (default "mean")
      --op-suites-errors string       Reducer operation for test suite error counts. Options: "max", "mean", "median", "min", "mode" or "sum" (default "mean")
      --op-suites-failed string       Reducer operation for test suite failure counts. Options: "max", "mean", "median", "min", "mode" or "sum" (default "mean")
      --op-suites-skipped string      Reducer operation for test suite skipped counts. Options: "max", "mean", "median", "min", "mode" or "sum" (default "mean")
      --op-suites-tests string        Reducer operation for test suite test counts. Options: "max", "mean", "median", "min", "mode" or "sum" (default "mean")
      --reduce-cases-by string        Key to group and reduce test cases by. Options: "classname", "file" or "name" (default "name")
      --reduce-suites-by string       Key to group and reduce test suites by. Options: "filepath", "name" or "name+filepath" (default "name+filepath")
      --rounding-mode string          Rounding mode for counts that should be integers in the final result. Options: "ceil", "floor" or "round" (default "round")

Examples

Basic usage

junit-reducer \
  --include="test-reports/**/*.xml" \     # Input path for JUnit reports
  --output-path="avg-reports/"        # Output path for averaged reports

Reduce by name

Group test suites and cases by a specific attribute, to deduplicate the reports in the most appropriate way.

junit-reducer \
  --include="test-reports/**/*.xml" \
  --output-path="avg-reports/" \
  --reduce-suites-by="name" \         # Grouping test suites by name
  --reduce-cases-by="classname"       # Grouping test cases by classname

Reduce with other operations

junit-reducer \
  --include="test-reports/**/*.xml" \
  --output-path="avg-reports/" \
  --op-suites-skipped="min" \         # Keeps min of skips across suites of same group
  --op-suites-failed="min" \          # Keeps min of failures across suites of same group
  --op-suites-errors="min" \          # Keeps min of errors across suites of same group
  --op-suites-tests="max" \           # Keeps max of tests across suites of same group
  --op-suites-assertions="max" \      # Keeps max of assertions across suites of same group
  --op-suites-time="mean" \           # Keeps mean of time across suites of same group
  --op-cases-time="mean"              # Keeps mean of time across cases of same group

Rounding average counts

You can also specify how to treat counts after they have been reduced.

junit-reducer \
  --include="test-reports/**/*.xml" \
  --output-path="avg-reports/" \
  --rounding-mode="floor"             # Specifies the rounding method for counts