Skip to content

Conversation

@aoyulong
Copy link
Contributor

@aoyulong aoyulong commented Jan 31, 2026

PR Category

CICD

PR Types

Improvemetns

PR Description

This pull request refactors the CI/CD workflows to support a unified and more flexible Python environment setup, making it easier to switch between package managers (conda, uv, pip) and improving maintainability. It also temporarily disables some functional test jobs for inference, serve, and RL tasks. The changes standardize environment configuration, update job inputs, and refactor environment activation and dependency installation logic.

Unified Python Environment and Package Management:

  • Added a new package manager configuration section to .github/configs/cuda.yml, allowing selection between conda, uv, and pip, and specifying environment names and paths for different tasks.
  • Updated workflow inputs in .github/workflows/all_tests_common.yml and related files to use generic pkg_mgr, env_name, and env_path parameters instead of conda-specific ones.

Refactored Environment Activation and Dependency Installation:

  • Replaced conda-specific logic in workflow scripts with logic that supports activating and using conda, uv, or pip environments, sourcing utility scripts for environment activation, and adjusting dependency installation per task and environment.

Workflow and Job Adjustments:

  • Temporarily disabled functional test jobs for inference, serve, and RL tasks in .github/workflows/all_tests_common.yml and removed their checks from the test completion logic.

These changes make the workflows more extensible and easier to maintain, and prepare the codebase for a future transition to the uv package manager.


-r ../common.txt

# PyTorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we reinvent the wheel while we have some well-established PEP standard on packaging Python projects?

Copy link
Contributor Author

@aoyulong aoyulong Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tengqm This is not for packaging a Python project but for creating Docker images and building from source, specifically for Flagscale developers and CI/CD developers. The next step will involve adding Python packaging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "creating Docker image", we mean we install FlagScale project with 1) its Python dependencies; 2) its native dependencies such as non-Python libs/tools.
The first step is still to ensure FlagScale itself as a Python project can be installed using the up-to-date community best practices. The community already has many package managers and the pyproject.toml approach has obviously won the competition. Why are we inventing another installer written in Shell script? The setup.py approach has already been abandoned by people, here we are reworking a similar tool in Bash. What's the point?

case "$PKG_MGR" in
conda)
if [ -n "$ENV_NAME" ] && [ -n "$ENV_PATH" ]; then
activate_conda "$ENV_NAME" "$ENV_PATH" || { echo "❌ Conda activation failed"; exit 1; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that several layers of shell functions have been written separately to activate a conda environment, including create_conda_env, conda_env_exists, accept_conda_tos, log_info and others. In my opinion, a single conda activate command is sufficient here; all the rest is redundant code that serves no practical purpose other than introducing additional cognitive load and complexity.

;;
uv)
if [ -n "$ENV_PATH" ] && [ -d "$ENV_PATH" ]; then
activate_uv_env "$ENV_PATH" || { echo "❌ UV activation failed"; exit 1; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the 150-line

--retry-count 3
;;
inference|rl)
echo "✅ All dependencies pre-installed for $TASK"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reserved entry point for the subsequent installation of the inference environment, or have the inference dependencies actually been pre-installed somewhere already?

ARG UV_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL}

# PyTorch wheel index (derived from CUDA version)
ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cu128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should align with the actual CUDA_VERSION when it is not 12.8

ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cu128

# =============================================================================
# BASE STAGE - System dependencies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be System Stage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants