Feature Description & Motivation
The update_neuron_sdk.sh lifecycle script pins to Neuron SDK Release 2.21.0, which is significantly outdated (current release is 2.27.1+). The pinned package versions are:
aws-neuronx-dkms=2.19.64.0
aws-neuronx-oci-hook=2.6.36.0
aws-neuronx-runtime-lib=2.23.110.0
aws-neuronx-collectives=2.23.133.0
aws-neuronx-tools=2.20.204.0
This script is unnecessary because HyperPod Slurm AMIs already ship with the Neuron SDK preinstalled and the SDK is automatically updated when users call the UpdateClusterSoftware API. See the HyperPod Slurm AMI release notes for the versions included in each AMI release.
Rather than continuously updating pinned versions in this script, it should be deprecated and removed to simplify the lifecycle scripts. Users who need a specific Neuron SDK version should rely on the preinstalled AMI version or the UpdateClusterSoftware API.
Related: #875
Files to Change
- Delete
1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/utils/update_neuron_sdk.sh
- Remove the
enable_update_neuron_sdk config flag in 1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/config.py (line 20)
- Remove the conditional block that calls
update_neuron_sdk.sh in 1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/lifecycle_script.py (lines 262-265)
- Remove the
sed command that enables the flag in 1.architectures/5.sagemaker-hyperpod/automate-smhp-slurm/automate-cluster-creation.sh (line 355)
Category
Enhancement to existing test case
Additional Context
Reviewer requirement: Because this change touches SageMaker HyperPod lifecycle scripts, the fix PR will require SageMaker service team review. Contributors should assign the PR to hyperpod-lcs-dev for review.
Feature Description & Motivation
The
update_neuron_sdk.shlifecycle script pins to Neuron SDK Release 2.21.0, which is significantly outdated (current release is 2.27.1+). The pinned package versions are:aws-neuronx-dkms=2.19.64.0aws-neuronx-oci-hook=2.6.36.0aws-neuronx-runtime-lib=2.23.110.0aws-neuronx-collectives=2.23.133.0aws-neuronx-tools=2.20.204.0This script is unnecessary because HyperPod Slurm AMIs already ship with the Neuron SDK preinstalled and the SDK is automatically updated when users call the
UpdateClusterSoftwareAPI. See the HyperPod Slurm AMI release notes for the versions included in each AMI release.Rather than continuously updating pinned versions in this script, it should be deprecated and removed to simplify the lifecycle scripts. Users who need a specific Neuron SDK version should rely on the preinstalled AMI version or the
UpdateClusterSoftwareAPI.Related: #875
Files to Change
1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/utils/update_neuron_sdk.shenable_update_neuron_sdkconfig flag in1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/config.py(line 20)update_neuron_sdk.shin1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/lifecycle_script.py(lines 262-265)sedcommand that enables the flag in1.architectures/5.sagemaker-hyperpod/automate-smhp-slurm/automate-cluster-creation.sh(line 355)Category
Enhancement to existing test case
Additional Context