System Info
The same script works with `Neuron SDK 2.18.0` and `optimum-neuronx v0.0.22`. But with the latest software stack
(aws_neuron_venv_pytorch) [ec2-user@ip-172-31-29-22 text-generation]$ yum list | grep neuron
aws-neuronx-collectives.x86_64 2.21.46.0_69b77134b-1 @neuron
aws-neuronx-dkms.noarch 2.17.17.0-dkms @neuron
aws-neuronx-runtime-lib.x86_64 2.21.41.0_fb1705f5f-1 @neuron
aws-neuronx-tools.x86_64 2.18.3.0-1 @neuron
aws-neuron-dkms.noarch 2.3.26.0-dkms neuron
aws-neuron-dkms.src 2.3.26.0-dkms neuron
aws-neuron-k8-plugin.x86_64 1.9.3.0-1 neuron
aws-neuron-k8-scheduler.x86_64 1.9.3.0-1 neuron
aws-neuron-runtime.x86_64 1.6.24.0-1 neuron
aws-neuron-runtime-base.x86_64 1.6.21.0-1 neuron
aws-neuron-tools.x86_64 2.1.4.0-1 neuron
aws-neuronx-dkms.src 2.17.17.0-dkms neuron
aws-neuronx-gpsimd-customop.x86_64 0.2.3.0-1 neuron
aws-neuronx-gpsimd-customop-lib.x86_64 0.11.4.0-1 neuron
aws-neuronx-gpsimd-tools.x86_64 0.11.3.0_36dcb86d4-1 neuron
aws-neuronx-k8-plugin.x86_64 2.21.14.0-1 neuron
aws-neuronx-k8-scheduler.x86_64 2.21.14.0-1 neuron
aws-neuronx-oci-hook.x86_64 2.4.4.0-1 neuron
tensorflow-model-server-neuron.x86_64 2.8.0.2.3.0.0-0 neuron
tensorflow-model-server-neuronx.x86_64 2.10.1.2.11.4.0-0 neuron
(aws_neuron_venv_pytorch) [ec2-user@ip-172-31-29-22 text-generation]$ pip list | grep neuron
aws-neuronx-runtime-discovery 2.9
libneuronxla 2.0.2335
neuronx-cc 2.13.66.0+6dfecc895
neuronx-distributed 0.7.0
optimum-neuron 0.0.23
torch-neuronx 2.1.2.2.1.0
transformers-neuronx 0.10.0.21
gives the following error.
745142719040221994+6bd63055/model.neff. Exiting with a successfully compiled graph.
2024-Jul-17 22:00:32.531450 57376:58367 ERROR TDRV:v2_cc_execute [nec_dev 1, gid 1] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/952ba576-b08c-4ac0-b0f1-12e560e5e362/model.MODULE_11203961494150985019+6bd63055.neff2024-Jul-17 22:00:32.5314452024-Jul-17 22:00:32.5314692024-Jul-17 22:00:32.5314502024-Jul-17 22:00:32.5314522024-Jul-17 22:00:32.531461 57380:58467 ERROR TDRV:v2_cc_execute 57381:57583 ERROR TDRV:v2_cc_execute
57379:57681 ERROR TDRV:v2_cc_execute 57378:58269 ERROR TDRV:v2_cc_execute [nec_dev 4, gid 4] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/ebe15e9e-3f50-4061-8ef1-9c80d9a78071/model.MODULE_12660838522657173708+6bd63055.neff[nec_dev 5, gid 5] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/a12cbe53-3042-4c9a-86d4-4bbcac795471/model.MODULE_15938951179947649509+6bd63055.neff 57382:57461 ERROR TDRV:v2_cc_execute [nec_dev 6, gid 6] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/eac078e9-79e3-49af-8695-c65797ca89c0/model.MODULE_3281029225498615900+6bd63055.neff[nec_dev 3, gid 3] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/43f12c55-64e6-45ad-a002-a0676ed72df9/model.MODULE_5948348338269179475+6bd63055.neff
[nec_dev 7, gid 7] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/59a5b5cd-fff2-4315-a603-8a152f5186ca/model.MODULE_12429740934125521760+6bd63055.neff2024-Jul-17 22:00:32.531563
57376:58367 ERROR ENC:enc_dump_neff_info [nec_dev 1, gid 1] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/952ba576-b08c-4ac0-b0f1-12e560e5e362/model.MODULE_11203961494150985019+6bd63055.neff2024-Jul-17 22:00:32.531607
2024-Jul-17 22:00:32.5316292024-Jul-17 22:00:32.5316332024-Jul-17 22:00:32.531631 57379:57681 ERROR ENC:enc_dump_neff_info 57378:58269 ERROR ENC:enc_dump_neff_info
57381:57583 ERROR ENC:enc_dump_neff_info 57380:58467 ERROR ENC:enc_dump_neff_info [nec_dev 4, gid 4] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/ebe15e9e-3f50-4061-8ef1-9c80d9a78071/model.MODULE_12660838522657173708+6bd63055.neff[nec_dev 3, gid 3] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/43f12c55-64e6-45ad-a002-a0676ed72df9/model.MODULE_5948348338269179475+6bd63055.neff[nec_dev 6, gid 6] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/eac078e9-79e3-49af-8695-c65797ca89c0/model.MODULE_3281029225498615900+6bd63055.neff[nec_dev 5, gid 5] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/a12cbe53-3042-4c9a-86d4-4bbcac795471/model.MODULE_15938951179947649509+6bd63055.neff2024-Jul-17 22:00:32.531670 57382:57461 ERROR ENC:enc_dump_neff_info
2024-Jul-17 22:00:32.531701 57376:58367 ERROR ENC:enc_dump_neff_info
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)
### Reproduction (minimal, reproducible, runnable)
The script I used is as below:
Launch the instance with Amazon Linux2023
Install the deps using the following script
Configure Linux for Neuron repository updates
sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
[neuron]
name=Neuron YUM Repository
baseurl=https://yum.repos.neuron.amazonaws.com
enabled=1
metadata_expire=0
EOF
sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
Update OS packages
sudo yum update -y
Install OS headers
sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
Install git
sudo yum install git -y
install Neuron Driver
sudo yum install aws-neuronx-dkms-2.* -y
Install Neuron Runtime
sudo yum install aws-neuronx-collectives-2.* -y
sudo yum install aws-neuronx-runtime-lib-2.* -y
Install Neuron Tools
sudo yum install aws-neuronx-tools-2.* -y
#Create python3 venv
sudo yum install -y libxcrypt-compat
sudo yum install -y gcc-c++
python3 -m venv /home/ec2-user/aws_neuron_venv_pytorch
#Activate venv
source ~/aws_neuron_venv_pytorch/bin/activate
python -m pip install -U pip
Install Jupyter notebook kernel
pip install ipykernel
python3 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name "Python (torch-neuronx)"
pip install jupyter notebook
pip install environment_kernels
Set pip repository pointing to the Neuron repository
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
Install wget, awscli
python -m pip install wget
python -m pip install awscli
Install Neuron Compiler and Framework
python -m pip install neuronx-cc==2.* torch-neuronx torchvision
#Install optmimum-neuronx
pip3 install --upgrade-strategy eager optimum[neuronx]
Download scripts
git clone https://github.com/huggingface/optimum-neuron.git
cd optimum-neuron/notebooks/text-generation/
Login with your huggingface token ID to download gated models
huggingface-cli login --token YOUR_TOKEN
Create a python3 file download_data.py to download and prcoess dataset under directory optimum-neuron/notebooks/text-generation/:
from datasets import load_dataset
from random import randrange
Load dataset from the hub
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
print(f"dataset size: {len(dataset)}")
print(dataset[randrange(len(dataset))])
def format_dolly(sample):
instruction = f"### Instruction\n{sample['instruction']}"
context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
response = f"### Answer\n{sample['response']}"
# join all the parts together
prompt = "\n\n".join([i for i in [instruction, context, response] if i is not None])
return prompt
from random import randrange
print(format_dolly(dataset[randrange(len(dataset))]))
from transformers import AutoTokenizer
Hugging Face model id
model_id = "meta-llama/Meta-Llama-3-8B" # gated
model_id = "meta-llama/Llama-2-7b-hf" # gated
tokenizer = AutoTokenizer.from_pretrained(model_id)
from random import randint
add utils method to path for loading dataset
import sys
sys.path.append("./scripts/utils") # make sure you change this to the correct path
from pack_dataset import pack_dataset
template dataset to add prompt to each sample
def template_dataset(sample):
sample["text"] = f"{format_dolly(sample)}{tokenizer.eos_token}"
return sample
apply prompt template per sample
dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))
print random sample
print(dataset[randint(0, len(dataset))]["text"])
tokenize dataset
dataset = dataset.map(
lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(dataset.features)
)
chunk dataset
lm_dataset = pack_dataset(dataset, chunk_length=2048) # We use 2048 as the maximum length for packing
save train_dataset to disk
dataset_path = "tokenized_dolly"
lm_dataset.save_to_disk(dataset_path)
Run the above script:
python download_data.py
Compile the finetuning script on inf2.8xlarge with the compile_llama3.sh script
MALLOC_ARENA_MAX=64 neuron_parallel_compile torchrun --nproc_per_node=8 scripts/run_clm.py
--model_id "meta-llama/Meta-Llama-3-8B"
--dataset_path "tokenized_dolly"
--bf16 True
--learning_rate 5e-5
--output_dir dolly_llama
--overwrite_output_dir True
--per_device_train_batch_size 1
--gradient_checkpointing True
--tensor_parallel_size 8
--max_steps 10
--logging_steps 10
--gradient_accumulation_steps 16
Run the finetuning on inf2.8xlarge with the run_llama3.sh script
MALLOC_ARENA_MAX=64 torchrun --nproc_per_node=8 scripts/run_clm.py
--model_id "meta-llama/Meta-Llama-3-8B"
--dataset_path "tokenized_dolly"
--bf16 True
--learning_rate 5e-5
--output_dir dolly_llama
--overwrite_output_dir True
--skip_cache_push True
--per_device_train_batch_size 1
--gradient_checkpointing True
--tensor_parallel_size 8
--num_train_epochs 3
--logging_steps 10
--gradient_accumulation_steps 16
### Expected behavior
The run command should give performance numbers.
System Info
gives the following error.
Launch the instance with Amazon Linux2023
Install the deps using the following script
Configure Linux for Neuron repository updates
sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
[neuron]
name=Neuron YUM Repository
baseurl=https://yum.repos.neuron.amazonaws.com
enabled=1
metadata_expire=0
EOF
sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
Update OS packages
sudo yum update -y
Install OS headers
sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
Install git
sudo yum install git -y
install Neuron Driver
sudo yum install aws-neuronx-dkms-2.* -y
Install Neuron Runtime
sudo yum install aws-neuronx-collectives-2.* -y
sudo yum install aws-neuronx-runtime-lib-2.* -y
Install Neuron Tools
sudo yum install aws-neuronx-tools-2.* -y
#Create python3 venv
sudo yum install -y libxcrypt-compat
sudo yum install -y gcc-c++
python3 -m venv /home/ec2-user/aws_neuron_venv_pytorch
#Activate venv
source ~/aws_neuron_venv_pytorch/bin/activate
python -m pip install -U pip
Install Jupyter notebook kernel
pip install ipykernel
python3 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name "Python (torch-neuronx)"
pip install jupyter notebook
pip install environment_kernels
Set pip repository pointing to the Neuron repository
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
Install wget, awscli
python -m pip install wget
python -m pip install awscli
Install Neuron Compiler and Framework
python -m pip install neuronx-cc==2.* torch-neuronx torchvision
#Install optmimum-neuronx
pip3 install --upgrade-strategy eager optimum[neuronx]
Download scripts
git clone https://github.com/huggingface/optimum-neuron.git
cd optimum-neuron/notebooks/text-generation/
Login with your huggingface token ID to download gated models
huggingface-cli login --token YOUR_TOKEN
Create a python3 file download_data.py to download and prcoess dataset under directory optimum-neuron/notebooks/text-generation/:
from datasets import load_dataset
from random import randrange
Load dataset from the hub
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
print(f"dataset size: {len(dataset)}")
print(dataset[randrange(len(dataset))])
def format_dolly(sample):
instruction = f"### Instruction\n{sample['instruction']}"
context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
response = f"### Answer\n{sample['response']}"
# join all the parts together
prompt = "\n\n".join([i for i in [instruction, context, response] if i is not None])
return prompt
from random import randrange
print(format_dolly(dataset[randrange(len(dataset))]))
from transformers import AutoTokenizer
Hugging Face model id
model_id = "meta-llama/Meta-Llama-3-8B" # gated
model_id = "meta-llama/Llama-2-7b-hf" # gated
tokenizer = AutoTokenizer.from_pretrained(model_id)
from random import randint
add utils method to path for loading dataset
import sys
sys.path.append("./scripts/utils") # make sure you change this to the correct path
from pack_dataset import pack_dataset
template dataset to add prompt to each sample
def template_dataset(sample):
sample["text"] = f"{format_dolly(sample)}{tokenizer.eos_token}"
return sample
apply prompt template per sample
dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))
print random sample
print(dataset[randint(0, len(dataset))]["text"])
tokenize dataset
dataset = dataset.map(
lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(dataset.features)
)
chunk dataset
lm_dataset = pack_dataset(dataset, chunk_length=2048) # We use 2048 as the maximum length for packing
save train_dataset to disk
dataset_path = "tokenized_dolly"
lm_dataset.save_to_disk(dataset_path)
Run the above script:
python download_data.py
Compile the finetuning script on inf2.8xlarge with the compile_llama3.sh script
MALLOC_ARENA_MAX=64 neuron_parallel_compile torchrun --nproc_per_node=8 scripts/run_clm.py
--model_id "meta-llama/Meta-Llama-3-8B"
--dataset_path "tokenized_dolly"
--bf16 True
--learning_rate 5e-5
--output_dir dolly_llama
--overwrite_output_dir True
--per_device_train_batch_size 1
--gradient_checkpointing True
--tensor_parallel_size 8
--max_steps 10
--logging_steps 10
--gradient_accumulation_steps 16
Run the finetuning on inf2.8xlarge with the run_llama3.sh script
MALLOC_ARENA_MAX=64 torchrun --nproc_per_node=8 scripts/run_clm.py
--model_id "meta-llama/Meta-Llama-3-8B"
--dataset_path "tokenized_dolly"
--bf16 True
--learning_rate 5e-5
--output_dir dolly_llama
--overwrite_output_dir True
--skip_cache_push True
--per_device_train_batch_size 1
--gradient_checkpointing True
--tensor_parallel_size 8
--num_train_epochs 3
--logging_steps 10
--gradient_accumulation_steps 16