Llama3-8B finetuning shows runtime error of TDRV:v2_cc_execute  

### System Info

```shell
The same script works with `Neuron SDK 2.18.0` and `optimum-neuronx v0.0.22`.  But with the latest software stack 

(aws_neuron_venv_pytorch) [ec2-user@ip-172-31-29-22 text-generation]$ yum list | grep neuron
aws-neuronx-collectives.x86_64                                    2.21.46.0_69b77134b-1                       @neuron         
aws-neuronx-dkms.noarch                                           2.17.17.0-dkms                              @neuron         
aws-neuronx-runtime-lib.x86_64                                    2.21.41.0_fb1705f5f-1                       @neuron         
aws-neuronx-tools.x86_64                                          2.18.3.0-1                                  @neuron         
aws-neuron-dkms.noarch                                            2.3.26.0-dkms                               neuron          
aws-neuron-dkms.src                                               2.3.26.0-dkms                               neuron          
aws-neuron-k8-plugin.x86_64                                       1.9.3.0-1                                   neuron          
aws-neuron-k8-scheduler.x86_64                                    1.9.3.0-1                                   neuron          
aws-neuron-runtime.x86_64                                         1.6.24.0-1                                  neuron          
aws-neuron-runtime-base.x86_64                                    1.6.21.0-1                                  neuron          
aws-neuron-tools.x86_64                                           2.1.4.0-1                                   neuron          
aws-neuronx-dkms.src                                              2.17.17.0-dkms                              neuron          
aws-neuronx-gpsimd-customop.x86_64                                0.2.3.0-1                                   neuron          
aws-neuronx-gpsimd-customop-lib.x86_64                            0.11.4.0-1                                  neuron          
aws-neuronx-gpsimd-tools.x86_64                                   0.11.3.0_36dcb86d4-1                        neuron          
aws-neuronx-k8-plugin.x86_64                                      2.21.14.0-1                                 neuron          
aws-neuronx-k8-scheduler.x86_64                                   2.21.14.0-1                                 neuron          
aws-neuronx-oci-hook.x86_64                                       2.4.4.0-1                                   neuron          
tensorflow-model-server-neuron.x86_64                             2.8.0.2.3.0.0-0                             neuron          
tensorflow-model-server-neuronx.x86_64                            2.10.1.2.11.4.0-0                           neuron       
```
```
(aws_neuron_venv_pytorch) [ec2-user@ip-172-31-29-22 text-generation]$ pip list | grep neuron
aws-neuronx-runtime-discovery 2.9
libneuronxla                  2.0.2335
neuronx-cc                    2.13.66.0+6dfecc895
neuronx-distributed           0.7.0
optimum-neuron                0.0.23
torch-neuronx                 2.1.2.2.1.0
transformers-neuronx          0.10.0.21
```


gives the following error.

```
745142719040221994+6bd63055/model.neff. Exiting with a successfully compiled graph.
2024-Jul-17 22:00:32.531450 57376:58367 ERROR  TDRV:v2_cc_execute                           [nec_dev 1, gid 1] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/952ba576-b08c-4ac0-b0f1-12e560e5e362/model.MODULE_11203961494150985019+6bd63055.neff2024-Jul-17 22:00:32.5314452024-Jul-17 22:00:32.5314692024-Jul-17 22:00:32.5314502024-Jul-17 22:00:32.5314522024-Jul-17 22:00:32.531461 57380:58467 ERROR  TDRV:v2_cc_execute                            57381:57583 ERROR  TDRV:v2_cc_execute                           
 57379:57681 ERROR  TDRV:v2_cc_execute                            57378:58269 ERROR  TDRV:v2_cc_execute                           [nec_dev 4, gid 4] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/ebe15e9e-3f50-4061-8ef1-9c80d9a78071/model.MODULE_12660838522657173708+6bd63055.neff[nec_dev 5, gid 5] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/a12cbe53-3042-4c9a-86d4-4bbcac795471/model.MODULE_15938951179947649509+6bd63055.neff 57382:57461 ERROR  TDRV:v2_cc_execute                           [nec_dev 6, gid 6] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/eac078e9-79e3-49af-8695-c65797ca89c0/model.MODULE_3281029225498615900+6bd63055.neff[nec_dev 3, gid 3] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/43f12c55-64e6-45ad-a002-a0676ed72df9/model.MODULE_5948348338269179475+6bd63055.neff
[nec_dev 7, gid 7] MPMD execution is not supported. This is likely caused for some but not all ranks recompiling/reloading a graph, model: /tmp/ec2-user/neuroncc_compile_workdir/59a5b5cd-fff2-4315-a603-8a152f5186ca/model.MODULE_12429740934125521760+6bd63055.neff2024-Jul-17 22:00:32.531563


 57376:58367 ERROR   ENC:enc_dump_neff_info                      [nec_dev 1, gid 1] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/952ba576-b08c-4ac0-b0f1-12e560e5e362/model.MODULE_11203961494150985019+6bd63055.neff2024-Jul-17 22:00:32.531607
2024-Jul-17 22:00:32.5316292024-Jul-17 22:00:32.5316332024-Jul-17 22:00:32.531631 57379:57681 ERROR   ENC:enc_dump_neff_info                       57378:58269 ERROR   ENC:enc_dump_neff_info                      
 57381:57583 ERROR   ENC:enc_dump_neff_info                       57380:58467 ERROR   ENC:enc_dump_neff_info                      [nec_dev 4, gid 4] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/ebe15e9e-3f50-4061-8ef1-9c80d9a78071/model.MODULE_12660838522657173708+6bd63055.neff[nec_dev 3, gid 3] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/43f12c55-64e6-45ad-a002-a0676ed72df9/model.MODULE_5948348338269179475+6bd63055.neff[nec_dev 6, gid 6] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/eac078e9-79e3-49af-8695-c65797ca89c0/model.MODULE_3281029225498615900+6bd63055.neff[nec_dev 5, gid 5] NEFF: /tmp/ec2-user/neuroncc_compile_workdir/a12cbe53-3042-4c9a-86d4-4bbcac795471/model.MODULE_15938951179947649509+6bd63055.neff2024-Jul-17 22:00:32.531670 57382:57461 ERROR   ENC:enc_dump_neff_info                      
2024-Jul-17 22:00:32.531701 57376:58367 ERROR   ENC:enc_dump_neff_info    
```
```


### Who can help?

_No response_

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

The script I used is as below:
```
Launch the instance with Amazon Linux2023
Install the deps using the following script


# Configure Linux for Neuron repository updates
sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
[neuron]
name=Neuron YUM Repository
baseurl=https://yum.repos.neuron.amazonaws.com
enabled=1
metadata_expire=0
EOF
sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB

# Update OS packages 
sudo yum update -y

# Install OS headers 
sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y

# Install git 
sudo yum install git -y

# install Neuron Driver
sudo yum install aws-neuronx-dkms-2.* -y

# Install Neuron Runtime 
sudo yum install aws-neuronx-collectives-2.* -y
sudo yum install aws-neuronx-runtime-lib-2.* -y

# Install Neuron Tools 
sudo yum install aws-neuronx-tools-2.* -y

#Create python3 venv
sudo yum install -y libxcrypt-compat
sudo yum install -y gcc-c++ 
python3 -m venv /home/ec2-user/aws_neuron_venv_pytorch

#Activate venv
source ~/aws_neuron_venv_pytorch/bin/activate

python -m pip install -U pip 

# Install Jupyter notebook kernel
pip install ipykernel 
python3 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name "Python (torch-neuronx)"
pip install jupyter notebook
pip install environment_kernels

# Set pip repository pointing to the Neuron repository 
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com

# Install wget, awscli 
python -m pip install wget 
python -m pip install awscli 

# Install Neuron Compiler and Framework
python -m pip install neuronx-cc==2.* torch-neuronx torchvision

#Install optmimum-neuronx
pip3 install --upgrade-strategy eager optimum[neuronx]

Download scripts


git clone https://github.com/huggingface/optimum-neuron.git

cd optimum-neuron/notebooks/text-generation/

Login with your huggingface token ID to download gated models


huggingface-cli login --token YOUR_TOKEN

Create a python3 file download_data.py to download and prcoess dataset under directory optimum-neuron/notebooks/text-generation/:


from datasets import load_dataset
from random import randrange

# Load dataset from the hub
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

print(f"dataset size: {len(dataset)}")
print(dataset[randrange(len(dataset))])

def format_dolly(sample):
    instruction = f"### Instruction\n{sample['instruction']}"
    context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
    response = f"### Answer\n{sample['response']}"
    # join all the parts together
    prompt = "\n\n".join([i for i in [instruction, context, response] if i is not None])
    return prompt

from random import randrange

print(format_dolly(dataset[randrange(len(dataset))]))

from transformers import AutoTokenizer

# Hugging Face model id
model_id = "meta-llama/Meta-Llama-3-8B" # gated
# model_id = "meta-llama/Llama-2-7b-hf" # gated

tokenizer = AutoTokenizer.from_pretrained(model_id)
from random import randint
# add utils method to path for loading dataset
import sys
sys.path.append("./scripts/utils") # make sure you change this to the correct path 
from pack_dataset import pack_dataset


# template dataset to add prompt to each sample
def template_dataset(sample):
    sample["text"] = f"{format_dolly(sample)}{tokenizer.eos_token}"
    return sample

# apply prompt template per sample
dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))
# print random sample
print(dataset[randint(0, len(dataset))]["text"])

# tokenize dataset
dataset = dataset.map(
    lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(dataset.features)
)

# chunk dataset
lm_dataset = pack_dataset(dataset, chunk_length=2048) # We use 2048 as the maximum length for packing

# save train_dataset to disk
dataset_path = "tokenized_dolly"
lm_dataset.save_to_disk(dataset_path)
Run the above script:

python download_data.py

Compile the finetuning script on inf2.8xlarge with the compile_llama3.sh script


MALLOC_ARENA_MAX=64 neuron_parallel_compile torchrun --nproc_per_node=8 scripts/run_clm.py \
 --model_id "meta-llama/Meta-Llama-3-8B" \
 --dataset_path "tokenized_dolly" \
 --bf16 True \
 --learning_rate 5e-5 \
 --output_dir dolly_llama \
 --overwrite_output_dir True \
 --per_device_train_batch_size 1 \
 --gradient_checkpointing True \
 --tensor_parallel_size 8 \
 --max_steps 10 \
 --logging_steps 10 \
 --gradient_accumulation_steps 16

Run the finetuning on inf2.8xlarge with the run_llama3.sh script


MALLOC_ARENA_MAX=64 torchrun --nproc_per_node=8 scripts/run_clm.py \
 --model_id "meta-llama/Meta-Llama-3-8B" \
 --dataset_path "tokenized_dolly" \
 --bf16 True \
 --learning_rate 5e-5 \
 --output_dir dolly_llama \
 --overwrite_output_dir True \
 --skip_cache_push True \
 --per_device_train_batch_size 1 \
 --gradient_checkpointing True \
 --tensor_parallel_size 8 \
 --num_train_epochs 3 \
 --logging_steps 10 \
 --gradient_accumulation_steps 16
```

### Expected behavior

The run command should give performance numbers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3-8B finetuning shows runtime error of TDRV:v2_cc_execute #658

System Info

Configure Linux for Neuron repository updates

Update OS packages

Install OS headers

Install git

install Neuron Driver

Install Neuron Runtime

Install Neuron Tools

Install Jupyter notebook kernel

Set pip repository pointing to the Neuron repository

Install wget, awscli

Install Neuron Compiler and Framework

Load dataset from the hub

Hugging Face model id

model_id = "meta-llama/Llama-2-7b-hf" # gated

add utils method to path for loading dataset

template dataset to add prompt to each sample

apply prompt template per sample

print random sample

tokenize dataset

chunk dataset

save train_dataset to disk

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Llama3-8B finetuning shows runtime error of TDRV:v2_cc_execute #658

Description

System Info

Configure Linux for Neuron repository updates

Update OS packages

Install OS headers

Install git

install Neuron Driver

Install Neuron Runtime

Install Neuron Tools

Install Jupyter notebook kernel

Set pip repository pointing to the Neuron repository

Install wget, awscli

Install Neuron Compiler and Framework

Load dataset from the hub

Hugging Face model id

model_id = "meta-llama/Llama-2-7b-hf" # gated

add utils method to path for loading dataset

template dataset to add prompt to each sample

apply prompt template per sample

print random sample

tokenize dataset

chunk dataset

save train_dataset to disk

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions