[FEA]: Refactor matrix.yaml to simply update process. #1758

jrhemstad · 2024-05-17T21:56:50Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

Infrastructure

Is your feature request related to a problem? Please describe.

As part of #1757, I found adding new compilers to matrix.yaml has a few more moving parts now that make it more complicated.

Describe the solution you'd like

One way I think we could simplify this is to move the per-compiler supported std versions to the compiler anchors.

So something like:

llvm14: &llvm14 { name: 'llvm', version: '14', exe: 'clang++' , std: [11, 14, 17, 20]}
llvm15: &llvm15 { name: 'llvm', version: '15', exe: 'clang++',  std: [11, 14, 17, 20]}
llvm16: &llvm16 { name: 'llvm', version: '16', exe: 'clang++' , std: [11, 14, 17, 20]}

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

jrhemstad · 2024-05-17T21:57:13Z

@alliepiper what do you think?

alliepiper · 2024-05-20T22:38:06Z

I suggest a more expansive approach that addresses a few other concerns I've had:

Device compiler specs are currently inconsistent (NVCC: string 'nvcc', ClangCUDA: object at anchor *llvm16)
Compiler 'name' is overloaded. Maybe this is subjective, but I find Clang, MSVC, and Intel more recognizable than "llvm", "cl", or "oneapi", but the latter are used for image names. Splitting the 'familiar' name and devcontainer tag would be nice from a UX perspective.
Expansion of the host-compiler object when printing original workflow lines for debugging / diagnostics gets annoying. Having a list of host compilers quickly becomes unreadable after the yaml parser turns anchors -> objects). Adding std info to the existing compiler objects would worsen this with the current implementation.

I think we can address these while making the change you requested by using strings instead of anchors in the job specification. We can then use these strings to lookup what we need in the metadata objects, which can grow in complexity as much as needed without cluttering up diagnostics.

I'm imagining something like this:

cuda_toolkits:
  - 11.1: { std: [11, 14, 17,   ] }
  - 11.8: { std: [11, 14, 17,   ] }
  - 12.0: { std: [11, 14, 17, 20] }
  - 12.4: { std: [11, 14, 17, 20], aka: 'curr' }

# Version info / std reqs implicit - use CTK for nvcc, host compiler for clang:
device_compilers: 
  nvcc:
    exe: 'nvcc'
  clang:
    exe: 'clang++'

host_compilers:
  gcc:
    name: 'GCC'
    container_tag: 'gcc'
    exe: 'g++'
    versions:
      - 6:  { std: [11, 14,       ] }
      - 7:  { std: [11, 14, 17,   ] }
      - 8:  { std: [11, 14, 17,   ] }
      - 9:  { std: [11, 14, 17,   ] }
      - 10: { std: [11, 14, 17, 20] }
      - 11: { std: [11, 14, 17, 20] }
      - 12: { std: [11, 14, 17, 20] }
  clang:
    name: 'Clang'
    container_tag: 'llvm'
    exe: 'clang++'
    versions:
      - 9:  { std: [11, 14, 17,   ] }
      - 10: { std: [11, 14, 17,   ] }
      - 11: { std: [11, 14, 17, 20] }
      - 12: { std: [11, 14, 17, 20] }
      - 13: { std: [11, 14, 17, 20] }
      - 14: { std: [11, 14, 17, 20] }
      - 15: { std: [11, 14, 17, 20] }
      - 16: { std: [11, 14, 17, 20] }
  msvc:
    name: 'MSVC'
    container_tag: 'cl'
    exe: cl
    versions:
      - 14.16: { std: [    14,       ], aka: '2017' }
      - 14.29: { std: [    14, 17,   ], aka: '2019' }
      - 14.36: { std: [    14, 17, 20]              }
      - 14.39: { std: [    14, 17, 20], aka: '2022' }
  intel:
    name: 'Intel'
    container_tag: 'oneapi'
    exe: icpc
    versions:
      - 2023.2.0: { std: [11, 14, 17,   ] }

This is how various lines from the workflow spec would change:

'<compiler>[version]' strings instead of anchors for host compilers.
Omit the version to use the latest.
Consistent, clearly defined strings for device compilers.
aka for MSVC version allows either year tags or explicit versions to be requested.
CTK entries are more compact after deanchorification.

# Current then new:

- {jobs: ['build'], std: 'all', cxx: [*gcc7, *gcc8, *gcc9, *gcc10, *gcc11]}
- {jobs: ['build'], std: 'all', cxx: ['gcc7', 'gcc8', 'gcc9', 'gcc10', 'gcc11']}

- {jobs: ['build'], std: 'all', cxx: [*llvm9, *llvm10, *llvm11, *llvm12, *llvm13, *llvm14, *llvm15]}
- {jobs: ['build'], std: 'all', cxx: ['clang9', 'clang10', 'clang11', 'clang12', 'clang13', 'clang14', 'clang15']}

- {jobs: ['test'],  std: 'all', cxx: [*gcc12, *llvm16]}
- {jobs: ['test'],  std: 'all', cxx: ['gcc12', 'clang16']}

- {jobs: ['build'], std: 'all', cxx: [*oneapi]}
- {jobs: ['build'], std: 'all', cxx: ['intel']}

- {jobs: ['build'], std: 'all', cxx: [*msvc2019, *msvc2022_1436]}
- {jobs: ['build'], std: 'all', cxx: ['msvc2019', 'msvc14.36']} # Either version format works via `aka`

# I'd prefer old versions be explicit to ensure compat. These rarely change, so it's less useful than 'latest/curr'?
- {jobs: ['infra'], project: 'cccl', ctk: *ctk_11_1, cxx: [*gcc-oldest, *llvm-oldest]}
- {jobs: ['infra'], project: 'cccl', ctk: '11.1', cxx: ['gcc6', 'clang9']}

- {jobs: ['infra'], project: 'cccl', ctk: *ctk_curr, cxx: [*gcc-newest, *llvm-newest]}
- {jobs: ['infra'], project: 'cccl', ctk: 'curr', cxx: ['gcc', 'clang']}

# clang cuda:
- {jobs: ['build'], std: [17, 20], cudacxx: *llvm-newest, cxx: *llvm-newest}
- {jobs: ['build'], std: [17, 20], cudacxx: 'clang', cxx: 'clang'}

alliepiper · 2024-06-06T18:38:49Z

Extended to include gpus, tags, projects, jobs, etc:

ctks:
  - 11.1: { stds: [11, 14, 17,   ] }
  - 11.8: { stds: [11, 14, 17,   ] }
  - 12.0: { stds: [11, 14, 17, 20] }
  - 12.4: { stds: [11, 14, 17, 20], aka: 'curr' }

# Version info / std reqs implicit - use CTK for nvcc, host compiler for clang:
device_compilers:
  nvcc:
    exe: 'nvcc'
  clang:
    exe: 'clang++'

host_compilers:
  gcc:
    name: 'GCC'
    container_tag: 'gcc'
    exe: 'g++'
    versions:
      - 6:  { stds: [11, 14,       ] }
      - 7:  { stds: [11, 14, 17,   ] }
      - 8:  { stds: [11, 14, 17,   ] }
      - 9:  { stds: [11, 14, 17,   ] }
      - 10: { stds: [11, 14, 17, 20] }
      - 11: { stds: [11, 14, 17, 20] }
      - 12: { stds: [11, 14, 17, 20] }
      - 13: { stds: [11, 14, 17, 20] }
  clang:
    name: 'Clang'
    container_tag: 'llvm'
    exe: 'clang++'
    versions:
      - 9:  { stds: [11, 14, 17,   ] }
      - 10: { stds: [11, 14, 17,   ] }
      - 11: { stds: [11, 14, 17, 20] }
      - 12: { stds: [11, 14, 17, 20] }
      - 13: { stds: [11, 14, 17, 20] }
      - 14: { stds: [11, 14, 17, 20] }
      - 15: { stds: [11, 14, 17, 20] }
      - 16: { stds: [11, 14, 17, 20] }
      - 17: { stds: [11, 14, 17, 20] }
  msvc:
    name: 'MSVC'
    container_tag: 'cl'
    exe: cl
    versions:
      - 14.16: { stds: [    14,       ], aka: '2017' }
      - 14.29: { stds: [    14, 17,   ], aka: '2019' }
      - 14.36: { stds: [    14, 17, 20]              }
      - 14.39: { stds: [    14, 17, 20], aka: '2022' }
  intel:
    name: 'Intel'
    container_tag: 'oneapi'
    exe: icpc
    versions:
      - 2023.2.0: { stds: [11, 14, 17,   ] }

# Jobs support the following properties:
#
# - gpu: Whether the job requires a GPU runner. Default is false.
# - name: The human-readable name of the job. Default is the capitalized job key.
# - needs:
#   - A list of jobs that must be completed before this job can run. Default is an empty list.
#   - These are automatically added if needed:
#     - Eg. "jobs: ['test']" in the workflow def will also create the required 'build' jobs.
# - invoke:
#   - Map the job type to the script invocation spec:
#     - prefix: The script invocation prefix. Default is the job name.
#     - args: Additional arguments to pass to the script. Default is no args.
#   - The script is invoked either:
#     linux:   `ci/windows/<spec[prefix]>_<project>.ps1 <spec[args]>`
#     windows: `ci/<spec[prefix]>_<project>.sh <spec[args]>`
jobs:
  # General:
  build: { gpu: false }
  test:  { gpu: true, needs: ['build'] }

  # CCCL:
  infra: { gpu: true } # example project launches a kernel

  # libcudacxx:
  nvrtc: { gpu: true, name: 'NVRTC' }
  verify_codegen: { gpu: false, name: 'VerifyCodegen' }

  # CUB:
  test_nolid: { name: 'TestGPU',      gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-no-lid'} }
  test_lid0:  { name: 'HostLaunch',   gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-lid0'} }
  test_lid1:  { name: 'DeviceLaunch', gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-lid1'} }
  test_lid2:  { name: 'GraphCapture', gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-lid2'} }

  # Thrust:
  test_cpu: { name: 'TestCPU', gpu: false, needs: ['build'], invoke: { prefix: 'test', args: '-cpu-only'} }
  test_gpu: { name: 'TestGPU', gpu: true,  needs: ['build'], invoke: { prefix: 'test', args: '=gpu-only'} }

# Project have the following properties:
#
# - stds: A list of C++ standards to test. Required.
# - name: The human-readable name of the project. Default is the project key.
# - job_map: Map general jobs to arrays of project-specific jobs.
#            Useful for things like splitting cpu/gpu testing for a project.
#            E.g. "job_map: { test: ['test_cpu', 'test_gpu'] }" replaces
#            the "test" job with distinct "test_cpu" and "test_gpu" jobs.
projects:
  cccl:
    name: 'CCCL'
    stds: [11, 14, 17, 20]
  libcudacxx:
    name: 'libcu++'
    stds: [11, 14, 17, 20]
  cub:
    name: 'CUB'
    stds: [11, 14, 17, 20]
    job_map: { test: ['test_nolid', 'test_lid0', 'test_lid1', 'test_lid2'] }
  thrust:
    name: 'Thrust'
    stds: [11, 14, 17, 20]
    job_map: { test: ['test_cpu', 'test_gpu'] }
  cudax:
    stds: [17, 20]

gpus:
  v100:     { sm: 70 }                # 32 GB,  40 runners
  t4:       { sm: 75, testing: true } # 16 GB,   8 runners
  rtx2080:  { sm: 75, testing: true } #  8 GB,   8 runners
  rtxa6000: { sm: 86, testing: true } # 48 GB,  12 runners
  l4:       { sm: 89, testing: true } # 24 GB,  48 runners
  rtx4090:  { sm: 89, testing: true } # 24 GB,  10 runners
  h100:     { sm: 90 }                # 80 GB,  16 runners

# Tags are used to define a `matrix job` in the workflow section.
#
# Tags have the following options:
#  - required: Whether the tag is required. Default is false.
#  - default: The default value for the tag. Default is null.
tags:
   # An array of jobs (e.g. 'build', 'test', 'nvrtc', 'infra', 'verify_codegen', ...)
   # See the `jobs` map.
  jobs: { required: true }
  # CUDA ToolKit version
  # See the `ctks` map.
  ctk: { default: 'curr' }
  # CPU architecture
  cpu: { default: 'amd64' }
  # GPU model
  gpu: { default: 'v100' }
  # Host compiler {name, version, exe}
  # See the `host_compilers` map.
  cxx: { default: 'gcc' }
  # Device compiler.
  # See the `device_compilers` map.
  cudacxx: { default: 'nvcc' }
  # Project name (e.g. libcudacxx, cub, thrust, cccl)
  # See the `projects` map.
  project: { default: ['libcudacxx', 'cub', 'thrust'] }
  # C++ standard
  # If set to 'all', all stds supported by the ctk/compilers/project are used.
  # If set, will be passed to script with `-std <std>`.
  std: { required: false }
  # GPU architecture
  # - If set, passed to script with `-arch <sm>`.
  # - Format is the same as `CMAKE_CUDA_ARCHITECTURES`:
  #   - PTX only: 70-virtual
  #   - SASS only: 70-real
  #   - Both: 70
  # - Can pass multiple architectures via "60;70-real;80-virtual"
  # - Defaults to use the settings in the CMakePresets.json file.
  # - Will be exploded if an array, e.g. `sm: ['60;70;80;90', '90a']` creates two jobs.
  # - Set to 'gpu' to only target the GPU in the `gpu` tag.
  sm: { required: false }
  # Additional CMake options to pass to the build.
  # If set, passed to script with `-cmake_options "<cmake_options>"`.
  cmake_options: { required: false }

jrhemstad added the feature request New feature or request. label May 17, 2024

alliepiper self-assigned this May 21, 2024

alliepiper mentioned this issue May 28, 2024

Add mechanism to split project tests into parallel jobs. #1696

Merged

alliepiper changed the title ~~[FEA]: Move support standard versions in matrix.yaml to per-compiler YAML anchors~~ [FEA]: Refactor matrix.yaml to simply update process. Jun 6, 2024

alliepiper mentioned this issue Jun 11, 2024

Refactor CI matrix. #1844

Merged

alliepiper closed this as completed in #1844 Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Refactor matrix.yaml to simply update process. #1758

[FEA]: Refactor matrix.yaml to simply update process. #1758

jrhemstad commented May 17, 2024

jrhemstad commented May 17, 2024

alliepiper commented May 20, 2024 •

edited

Loading

alliepiper commented Jun 6, 2024

[FEA]: Refactor matrix.yaml to simply update process. #1758

[FEA]: Refactor matrix.yaml to simply update process. #1758

Comments

jrhemstad commented May 17, 2024

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

jrhemstad commented May 17, 2024

alliepiper commented May 20, 2024 • edited Loading

alliepiper commented Jun 6, 2024

alliepiper commented May 20, 2024 •

edited

Loading