Refactor install scripts and add dockerfiles #1100

aoyulong · 2026-01-31T16:08:09Z

PR Category

CICD

PR Types

Improvemetns

PR Description

This pull request refactors the CI/CD workflows to support a unified and more flexible Python environment setup, making it easier to switch between package managers (conda, uv, pip) and improving maintainability. It also temporarily disables some functional test jobs for inference, serve, and RL tasks. The changes standardize environment configuration, update job inputs, and refactor environment activation and dependency installation logic.

Unified Python Environment and Package Management:

Added a new package manager configuration section to .github/configs/cuda.yml, allowing selection between conda, uv, and pip, and specifying environment names and paths for different tasks.
Updated workflow inputs in .github/workflows/all_tests_common.yml and related files to use generic pkg_mgr, env_name, and env_path parameters instead of conda-specific ones.

Refactored Environment Activation and Dependency Installation:

Replaced conda-specific logic in workflow scripts with logic that supports activating and using conda, uv, or pip environments, sourcing utility scripts for environment activation, and adjusting dependency installation per task and environment.

Workflow and Job Adjustments:

Temporarily disabled functional test jobs for inference, serve, and RL tasks in .github/workflows/all_tests_common.yml and removed their checks from the test completion logic.

These changes make the workflows more extensible and easier to maintain, and prepare the codebase for a future transition to the uv package manager.

tengqm · 2026-02-02T03:54:08Z

requirements/cuda/base.txt


 -r ../common.txt
+
+# PyTorch


I'm curious why we reinvent the wheel while we have some well-established PEP standard on packaging Python projects?

@tengqm This is not for packaging a Python project but for creating Docker images and building from source, specifically for Flagscale developers and CI/CD developers. The next step will involve adding Python packaging.

By "creating Docker image", we mean we install FlagScale project with 1) its Python dependencies; 2) its native dependencies such as non-Python libs/tools.
The first step is still to ensure FlagScale itself as a Python project can be installed using the up-to-date community best practices. The community already has many package managers and the pyproject.toml approach has obviously won the competition. Why are we inventing another installer written in Shell script? The setup.py approach has already been abandoned by people, here we are reworking a similar tool in Bash. What's the point?

Darryl233 · 2026-02-02T12:08:01Z

.github/workflows/functional_tests_common.yml

+          case "$PKG_MGR" in
+            conda)
+              if [ -n "$ENV_NAME" ] && [ -n "$ENV_PATH" ]; then
+                activate_conda "$ENV_NAME" "$ENV_PATH" || { echo "❌ Conda activation failed"; exit 1; }


I've noticed that several layers of shell functions have been written separately to activate a conda environment, including create_conda_env, conda_env_exists, accept_conda_tos, log_info and others. In my opinion, a single conda activate command is sufficient here; all the rest is redundant code that serves no practical purpose other than introducing additional cognitive load and complexity.

Darryl233 · 2026-02-02T12:14:39Z

.github/workflows/functional_tests_common.yml

+              ;;
+            uv)
+              if [ -n "$ENV_PATH" ] && [ -d "$ENV_PATH" ]; then
+                activate_uv_env "$ENV_PATH" || { echo "❌ UV activation failed"; exit 1; }


Same issue as the 150-line

Darryl233 · 2026-02-02T13:02:50Z

.github/workflows/functional_tests_common.yml

+                --retry-count 3
+              ;;
+            inference|rl)
+              echo "✅ All dependencies pre-installed for $TASK"


Is this a reserved entry point for the subsequent installation of the inference environment, or have the inference dependencies actually been pre-installed somewhere already?

Darryl233 · 2026-02-02T13:26:14Z

docker/cuda/Dockerfile.all

+ARG UV_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL}
+
+# PyTorch wheel index (derived from CUDA version)
+ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cu128


Should align with the actual CUDA_VERSION when it is not 12.8

Darryl233 · 2026-02-02T13:37:33Z

docker/cuda/Dockerfile.all

+ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cu128
+
+# =============================================================================
+# BASE STAGE - System dependencies


Should it be System Stage？

aoyulong and others added 16 commits January 18, 2026 23:17

Refactor the install and requirements structure

5fb199a

Refactor install and requirements

0b1ae13

Try to fix the workflow error

249e400

Remove the action yml

aea18ad

Fix the path problem

45e48fa

Improve the workflows

bfea31e

Improve the code

2746489

Add dev install mode

dadebfc

Refactor the dockerfile and install system

7ef0f52

Merge remote-tracking branch 'upstream/main'

26e814c

Fix the conda activation

c529cbb

Refactor install scripts and update workflows

9a56fda

Merge remote-tracking branch 'upstream/main'

bd7e1c7

Fix the conda activate issue in CI workflows

fdd3a4e

Update install deps installation

613504a

Refactor install scripts and add install_dev.sh

0bfb619

tengqm reviewed Feb 2, 2026

View reviewed changes

Merge sscache into the current impl

6fe0e35

Darryl233 reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor install scripts and add dockerfiles #1100

Refactor install scripts and add dockerfiles #1100

Uh oh!

aoyulong commented Jan 31, 2026 •

edited

Loading

Uh oh!

tengqm Feb 2, 2026

Uh oh!

aoyulong Feb 2, 2026 •

edited

Loading

Uh oh!

tengqm Feb 2, 2026

Uh oh!

Darryl233 Feb 2, 2026

Uh oh!

Darryl233 Feb 2, 2026

Uh oh!

Darryl233 Feb 2, 2026

Uh oh!

Darryl233 Feb 2, 2026

Uh oh!

Darryl233 Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor install scripts and add dockerfiles #1100

Are you sure you want to change the base?

Refactor install scripts and add dockerfiles #1100

Uh oh!

Conversation

aoyulong commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

PR Description

Uh oh!

tengqm Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

aoyulong Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tengqm Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Darryl233 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Darryl233 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Darryl233 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Darryl233 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Darryl233 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aoyulong commented Jan 31, 2026 •

edited

Loading

aoyulong Feb 2, 2026 •

edited

Loading