Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Refactor matrix.yaml to simply update process. #1758

Closed
1 task done
jrhemstad opened this issue May 17, 2024 · 3 comments · Fixed by #1844
Closed
1 task done

[FEA]: Refactor matrix.yaml to simply update process. #1758

jrhemstad opened this issue May 17, 2024 · 3 comments · Fixed by #1844
Assignees
Labels
feature request New feature or request.

Comments

@jrhemstad
Copy link
Collaborator

Is this a duplicate?

Area

Infrastructure

Is your feature request related to a problem? Please describe.

As part of #1757, I found adding new compilers to matrix.yaml has a few more moving parts now that make it more complicated.

Describe the solution you'd like

One way I think we could simplify this is to move the per-compiler supported std versions to the compiler anchors.

So something like:

llvm14: &llvm14 { name: 'llvm', version: '14', exe: 'clang++' , std: [11, 14, 17, 20]}
llvm15: &llvm15 { name: 'llvm', version: '15', exe: 'clang++',  std: [11, 14, 17, 20]}
llvm16: &llvm16 { name: 'llvm', version: '16', exe: 'clang++' , std: [11, 14, 17, 20]}

Describe alternatives you've considered

No response

Additional context

No response

@jrhemstad jrhemstad added the feature request New feature or request. label May 17, 2024
@jrhemstad
Copy link
Collaborator Author

@alliepiper what do you think?

@alliepiper
Copy link
Collaborator

alliepiper commented May 20, 2024

I suggest a more expansive approach that addresses a few other concerns I've had:

  1. Device compiler specs are currently inconsistent (NVCC: string 'nvcc', ClangCUDA: object at anchor *llvm16)
  2. Compiler 'name' is overloaded. Maybe this is subjective, but I find Clang, MSVC, and Intel more recognizable than "llvm", "cl", or "oneapi", but the latter are used for image names. Splitting the 'familiar' name and devcontainer tag would be nice from a UX perspective.
  3. Expansion of the host-compiler object when printing original workflow lines for debugging / diagnostics gets annoying. Having a list of host compilers quickly becomes unreadable after the yaml parser turns anchors -> objects). Adding std info to the existing compiler objects would worsen this with the current implementation.

I think we can address these while making the change you requested by using strings instead of anchors in the job specification. We can then use these strings to lookup what we need in the metadata objects, which can grow in complexity as much as needed without cluttering up diagnostics.

I'm imagining something like this:

cuda_toolkits:
  - 11.1: { std: [11, 14, 17,   ] }
  - 11.8: { std: [11, 14, 17,   ] }
  - 12.0: { std: [11, 14, 17, 20] }
  - 12.4: { std: [11, 14, 17, 20], aka: 'curr' }

# Version info / std reqs implicit - use CTK for nvcc, host compiler for clang:
device_compilers: 
  nvcc:
    exe: 'nvcc'
  clang:
    exe: 'clang++'

host_compilers:
  gcc:
    name: 'GCC'
    container_tag: 'gcc'
    exe: 'g++'
    versions:
      - 6:  { std: [11, 14,       ] }
      - 7:  { std: [11, 14, 17,   ] }
      - 8:  { std: [11, 14, 17,   ] }
      - 9:  { std: [11, 14, 17,   ] }
      - 10: { std: [11, 14, 17, 20] }
      - 11: { std: [11, 14, 17, 20] }
      - 12: { std: [11, 14, 17, 20] }
  clang:
    name: 'Clang'
    container_tag: 'llvm'
    exe: 'clang++'
    versions:
      - 9:  { std: [11, 14, 17,   ] }
      - 10: { std: [11, 14, 17,   ] }
      - 11: { std: [11, 14, 17, 20] }
      - 12: { std: [11, 14, 17, 20] }
      - 13: { std: [11, 14, 17, 20] }
      - 14: { std: [11, 14, 17, 20] }
      - 15: { std: [11, 14, 17, 20] }
      - 16: { std: [11, 14, 17, 20] }
  msvc:
    name: 'MSVC'
    container_tag: 'cl'
    exe: cl
    versions:
      - 14.16: { std: [    14,       ], aka: '2017' }
      - 14.29: { std: [    14, 17,   ], aka: '2019' }
      - 14.36: { std: [    14, 17, 20]              }
      - 14.39: { std: [    14, 17, 20], aka: '2022' }
  intel:
    name: 'Intel'
    container_tag: 'oneapi'
    exe: icpc
    versions:
      - 2023.2.0: { std: [11, 14, 17,   ] }

This is how various lines from the workflow spec would change:

  • '<compiler>[version]' strings instead of anchors for host compilers.
  • Omit the version to use the latest.
  • Consistent, clearly defined strings for device compilers.
  • aka for MSVC version allows either year tags or explicit versions to be requested.
  • CTK entries are more compact after deanchorification.
# Current then new:

- {jobs: ['build'], std: 'all', cxx: [*gcc7, *gcc8, *gcc9, *gcc10, *gcc11]}
- {jobs: ['build'], std: 'all', cxx: ['gcc7', 'gcc8', 'gcc9', 'gcc10', 'gcc11']}

- {jobs: ['build'], std: 'all', cxx: [*llvm9, *llvm10, *llvm11, *llvm12, *llvm13, *llvm14, *llvm15]}
- {jobs: ['build'], std: 'all', cxx: ['clang9', 'clang10', 'clang11', 'clang12', 'clang13', 'clang14', 'clang15']}

- {jobs: ['test'],  std: 'all', cxx: [*gcc12, *llvm16]}
- {jobs: ['test'],  std: 'all', cxx: ['gcc12', 'clang16']}

- {jobs: ['build'], std: 'all', cxx: [*oneapi]}
- {jobs: ['build'], std: 'all', cxx: ['intel']}

- {jobs: ['build'], std: 'all', cxx: [*msvc2019, *msvc2022_1436]}
- {jobs: ['build'], std: 'all', cxx: ['msvc2019', 'msvc14.36']} # Either version format works via `aka`

# I'd prefer old versions be explicit to ensure compat. These rarely change, so it's less useful than 'latest/curr'?
- {jobs: ['infra'], project: 'cccl', ctk: *ctk_11_1, cxx: [*gcc-oldest, *llvm-oldest]}
- {jobs: ['infra'], project: 'cccl', ctk: '11.1', cxx: ['gcc6', 'clang9']}

- {jobs: ['infra'], project: 'cccl', ctk: *ctk_curr, cxx: [*gcc-newest, *llvm-newest]}
- {jobs: ['infra'], project: 'cccl', ctk: 'curr', cxx: ['gcc', 'clang']}

# clang cuda:
- {jobs: ['build'], std: [17, 20], cudacxx: *llvm-newest, cxx: *llvm-newest}
- {jobs: ['build'], std: [17, 20], cudacxx: 'clang', cxx: 'clang'}

@alliepiper alliepiper self-assigned this May 21, 2024
@alliepiper alliepiper changed the title [FEA]: Move support standard versions in matrix.yaml to per-compiler YAML anchors [FEA]: Refactor matrix.yaml to simply update process. Jun 6, 2024
@alliepiper
Copy link
Collaborator

Extended to include gpus, tags, projects, jobs, etc:

ctks:
  - 11.1: { stds: [11, 14, 17,   ] }
  - 11.8: { stds: [11, 14, 17,   ] }
  - 12.0: { stds: [11, 14, 17, 20] }
  - 12.4: { stds: [11, 14, 17, 20], aka: 'curr' }

# Version info / std reqs implicit - use CTK for nvcc, host compiler for clang:
device_compilers:
  nvcc:
    exe: 'nvcc'
  clang:
    exe: 'clang++'

host_compilers:
  gcc:
    name: 'GCC'
    container_tag: 'gcc'
    exe: 'g++'
    versions:
      - 6:  { stds: [11, 14,       ] }
      - 7:  { stds: [11, 14, 17,   ] }
      - 8:  { stds: [11, 14, 17,   ] }
      - 9:  { stds: [11, 14, 17,   ] }
      - 10: { stds: [11, 14, 17, 20] }
      - 11: { stds: [11, 14, 17, 20] }
      - 12: { stds: [11, 14, 17, 20] }
      - 13: { stds: [11, 14, 17, 20] }
  clang:
    name: 'Clang'
    container_tag: 'llvm'
    exe: 'clang++'
    versions:
      - 9:  { stds: [11, 14, 17,   ] }
      - 10: { stds: [11, 14, 17,   ] }
      - 11: { stds: [11, 14, 17, 20] }
      - 12: { stds: [11, 14, 17, 20] }
      - 13: { stds: [11, 14, 17, 20] }
      - 14: { stds: [11, 14, 17, 20] }
      - 15: { stds: [11, 14, 17, 20] }
      - 16: { stds: [11, 14, 17, 20] }
      - 17: { stds: [11, 14, 17, 20] }
  msvc:
    name: 'MSVC'
    container_tag: 'cl'
    exe: cl
    versions:
      - 14.16: { stds: [    14,       ], aka: '2017' }
      - 14.29: { stds: [    14, 17,   ], aka: '2019' }
      - 14.36: { stds: [    14, 17, 20]              }
      - 14.39: { stds: [    14, 17, 20], aka: '2022' }
  intel:
    name: 'Intel'
    container_tag: 'oneapi'
    exe: icpc
    versions:
      - 2023.2.0: { stds: [11, 14, 17,   ] }

# Jobs support the following properties:
#
# - gpu: Whether the job requires a GPU runner. Default is false.
# - name: The human-readable name of the job. Default is the capitalized job key.
# - needs:
#   - A list of jobs that must be completed before this job can run. Default is an empty list.
#   - These are automatically added if needed:
#     - Eg. "jobs: ['test']" in the workflow def will also create the required 'build' jobs.
# - invoke:
#   - Map the job type to the script invocation spec:
#     - prefix: The script invocation prefix. Default is the job name.
#     - args: Additional arguments to pass to the script. Default is no args.
#   - The script is invoked either:
#     linux:   `ci/windows/<spec[prefix]>_<project>.ps1 <spec[args]>`
#     windows: `ci/<spec[prefix]>_<project>.sh <spec[args]>`
jobs:
  # General:
  build: { gpu: false }
  test:  { gpu: true, needs: ['build'] }

  # CCCL:
  infra: { gpu: true } # example project launches a kernel

  # libcudacxx:
  nvrtc: { gpu: true, name: 'NVRTC' }
  verify_codegen: { gpu: false, name: 'VerifyCodegen' }

  # CUB:
  test_nolid: { name: 'TestGPU',      gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-no-lid'} }
  test_lid0:  { name: 'HostLaunch',   gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-lid0'} }
  test_lid1:  { name: 'DeviceLaunch', gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-lid1'} }
  test_lid2:  { name: 'GraphCapture', gpu: true, needs: ['build'], invoke: { prefix: 'test', args: '-lid2'} }

  # Thrust:
  test_cpu: { name: 'TestCPU', gpu: false, needs: ['build'], invoke: { prefix: 'test', args: '-cpu-only'} }
  test_gpu: { name: 'TestGPU', gpu: true,  needs: ['build'], invoke: { prefix: 'test', args: '=gpu-only'} }

# Project have the following properties:
#
# - stds: A list of C++ standards to test. Required.
# - name: The human-readable name of the project. Default is the project key.
# - job_map: Map general jobs to arrays of project-specific jobs.
#            Useful for things like splitting cpu/gpu testing for a project.
#            E.g. "job_map: { test: ['test_cpu', 'test_gpu'] }" replaces
#            the "test" job with distinct "test_cpu" and "test_gpu" jobs.
projects:
  cccl:
    name: 'CCCL'
    stds: [11, 14, 17, 20]
  libcudacxx:
    name: 'libcu++'
    stds: [11, 14, 17, 20]
  cub:
    name: 'CUB'
    stds: [11, 14, 17, 20]
    job_map: { test: ['test_nolid', 'test_lid0', 'test_lid1', 'test_lid2'] }
  thrust:
    name: 'Thrust'
    stds: [11, 14, 17, 20]
    job_map: { test: ['test_cpu', 'test_gpu'] }
  cudax:
    stds: [17, 20]

gpus:
  v100:     { sm: 70 }                # 32 GB,  40 runners
  t4:       { sm: 75, testing: true } # 16 GB,   8 runners
  rtx2080:  { sm: 75, testing: true } #  8 GB,   8 runners
  rtxa6000: { sm: 86, testing: true } # 48 GB,  12 runners
  l4:       { sm: 89, testing: true } # 24 GB,  48 runners
  rtx4090:  { sm: 89, testing: true } # 24 GB,  10 runners
  h100:     { sm: 90 }                # 80 GB,  16 runners

# Tags are used to define a `matrix job` in the workflow section.
#
# Tags have the following options:
#  - required: Whether the tag is required. Default is false.
#  - default: The default value for the tag. Default is null.
tags:
   # An array of jobs (e.g. 'build', 'test', 'nvrtc', 'infra', 'verify_codegen', ...)
   # See the `jobs` map.
  jobs: { required: true }
  # CUDA ToolKit version
  # See the `ctks` map.
  ctk: { default: 'curr' }
  # CPU architecture
  cpu: { default: 'amd64' }
  # GPU model
  gpu: { default: 'v100' }
  # Host compiler {name, version, exe}
  # See the `host_compilers` map.
  cxx: { default: 'gcc' }
  # Device compiler.
  # See the `device_compilers` map.
  cudacxx: { default: 'nvcc' }
  # Project name (e.g. libcudacxx, cub, thrust, cccl)
  # See the `projects` map.
  project: { default: ['libcudacxx', 'cub', 'thrust'] }
  # C++ standard
  # If set to 'all', all stds supported by the ctk/compilers/project are used.
  # If set, will be passed to script with `-std <std>`.
  std: { required: false }
  # GPU architecture
  # - If set, passed to script with `-arch <sm>`.
  # - Format is the same as `CMAKE_CUDA_ARCHITECTURES`:
  #   - PTX only: 70-virtual
  #   - SASS only: 70-real
  #   - Both: 70
  # - Can pass multiple architectures via "60;70-real;80-virtual"
  # - Defaults to use the settings in the CMakePresets.json file.
  # - Will be exploded if an array, e.g. `sm: ['60;70;80;90', '90a']` creates two jobs.
  # - Set to 'gpu' to only target the GPU in the `gpu` tag.
  sm: { required: false }
  # Additional CMake options to pass to the build.
  # If set, passed to script with `-cmake_options "<cmake_options>"`.
  cmake_options: { required: false }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants