-
Notifications
You must be signed in to change notification settings - Fork 136
Refactor install scripts and add dockerfiles #1100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| -r ../common.txt | ||
|
|
||
| # PyTorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we reinvent the wheel while we have some well-established PEP standard on packaging Python projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tengqm This is not for packaging a Python project but for creating Docker images and building from source, specifically for Flagscale developers and CI/CD developers. The next step will involve adding Python packaging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "creating Docker image", we mean we install FlagScale project with 1) its Python dependencies; 2) its native dependencies such as non-Python libs/tools.
The first step is still to ensure FlagScale itself as a Python project can be installed using the up-to-date community best practices. The community already has many package managers and the pyproject.toml approach has obviously won the competition. Why are we inventing another installer written in Shell script? The setup.py approach has already been abandoned by people, here we are reworking a similar tool in Bash. What's the point?
| case "$PKG_MGR" in | ||
| conda) | ||
| if [ -n "$ENV_NAME" ] && [ -n "$ENV_PATH" ]; then | ||
| activate_conda "$ENV_NAME" "$ENV_PATH" || { echo "❌ Conda activation failed"; exit 1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that several layers of shell functions have been written separately to activate a conda environment, including create_conda_env, conda_env_exists, accept_conda_tos, log_info and others. In my opinion, a single conda activate command is sufficient here; all the rest is redundant code that serves no practical purpose other than introducing additional cognitive load and complexity.
| ;; | ||
| uv) | ||
| if [ -n "$ENV_PATH" ] && [ -d "$ENV_PATH" ]; then | ||
| activate_uv_env "$ENV_PATH" || { echo "❌ UV activation failed"; exit 1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as the 150-line
| --retry-count 3 | ||
| ;; | ||
| inference|rl) | ||
| echo "✅ All dependencies pre-installed for $TASK" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a reserved entry point for the subsequent installation of the inference environment, or have the inference dependencies actually been pre-installed somewhere already?
| ARG UV_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL} | ||
|
|
||
| # PyTorch wheel index (derived from CUDA version) | ||
| ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cu128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should align with the actual CUDA_VERSION when it is not 12.8
| ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cu128 | ||
|
|
||
| # ============================================================================= | ||
| # BASE STAGE - System dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be System Stage?
PR Category
CICD
PR Types
Improvemetns
PR Description
This pull request refactors the CI/CD workflows to support a unified and more flexible Python environment setup, making it easier to switch between package managers (conda, uv, pip) and improving maintainability. It also temporarily disables some functional test jobs for inference, serve, and RL tasks. The changes standardize environment configuration, update job inputs, and refactor environment activation and dependency installation logic.
Unified Python Environment and Package Management:
.github/configs/cuda.yml, allowing selection betweenconda,uv, andpip, and specifying environment names and paths for different tasks..github/workflows/all_tests_common.ymland related files to use genericpkg_mgr,env_name, andenv_pathparameters instead of conda-specific ones.Refactored Environment Activation and Dependency Installation:
Workflow and Job Adjustments:
.github/workflows/all_tests_common.ymland removed their checks from the test completion logic.These changes make the workflows more extensible and easier to maintain, and prepare the codebase for a future transition to the
uvpackage manager.