Could not find a matching NEFF for your HLO in this directory #730

SteliosGian · 2024-10-30T10:41:11Z

System Info

I'm compiling a fine-tuned Llama 3.1 70B model with the below system info on an inf2.48xlarge machine. I'm using neuronX TGI 0.0.25 with AWS Sagemaker. I get the below error:

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/compiled/e28e32d0143dad6277a9.neff'
FileNotFoundError: Could not find a matching NEFF for your HLO in this directory. Ensure that the model you are trying to load is the same type and has the same parameters as the one you saved or call "save" on this model to reserialize it.

Platform:

- Platform: Linux-6.8.0-1015-aws-x86_64-with-glibc2.35
- Python version: 3.10.12


Python packages:

- `optimum-neuron` version: 0.0.25
- `neuron-sdk` version: 2.20.0
- `optimum` version: 1.22.0
- `transformers` version: 4.43.2
- `huggingface_hub` version: 0.26.2
- `torch` version: 2.1.2+cu121
- `aws-neuronx-runtime-discovery` version: 2.9
- `libneuronxla` version: 2.0.4115.0
- `neuronx-cc` version: 2.15.128.0+56dc5a86
- `neuronx-distributed` version: 0.8.0
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.1.2.2.3.0
- `torch-xla` version: 2.1.4
- `transformers-neuronx` version: 0.12.313


Neuron Driver:


WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

This is my compilation command:

optimum-cli export neuron -m orig-llama/ --batch_size 4 --task text-generation --sequence_length 4096 --num_cores 24 --auto_cast_type bf16 ./neuron-llama-throughput

Here is my TGI env:

sagemaker_model_env = {
    "SM_MODEL_DIR" = "/opt/ml/model"
    "HF_MODEL_ID" = "/opt/ml/model"
    "HF_NUM_CORES" = "24"
    "HF_BATCH_SIZE" = "4"
    "HF_SEQUENCE_LENGTH" = "4096"
    "HF_AUTO_CAST_TYPE" = "bf16"
    "MAX_BATCH_SIZE" = "4"
    "MAX_INPUT_TOKENS" = "3072"
    "MAX_TOTAL_TOKENS" = "4096"
    "MESSAGES_API_ENABLED" = "false"
    "MAX_BATCH_PREFILL_TOKENS" = "16384"
    "MAX_BATCH_TOTAL_TOKENS" = "20000"
    "ROPE_SCALING" = "dynamic"
    "ROPE_FACTOR" = "8.0"
  }

Who can help?

@dacorvo @JingyaHuang

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Compile Llama 3.1 70B using the below system info on inf2.48xlarge and run on neuronX TGI version 0.0.25

Platform:

Platform: Linux-6.8.0-1015-aws-x86_64-with-glibc2.35
Python version: 3.10.12

Python packages:

optimum-neuron version: 0.0.25
neuron-sdk version: 2.20.0
optimum version: 1.22.0
transformers version: 4.43.2
huggingface_hub version: 0.26.2
torch version: 2.1.2+cu121
aws-neuronx-runtime-discovery version: 2.9
libneuronxla version: 2.0.4115.0
neuronx-cc version: 2.15.128.0+56dc5a86
neuronx-distributed version: 0.8.0
neuronx-hwm version: NA
torch-neuronx version: 2.1.2.2.3.0
torch-xla version: 2.1.4
transformers-neuronx version: 0.12.313

Neuron Driver:

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

Expected behavior

The neuronX TGI server to start

The text was updated successfully, but these errors were encountered:

jimburtoft · 2024-11-05T20:02:12Z

I want to rule out an SDK mismatch between the compilation environment and the hosting environment.

Are you deploying on SageMaker? What image are you using?

If you are not deploying on SageMaker, try compiling using the TGI image itself.

See the example here:
https://github.com/huggingface/optimum-neuron/tree/main/benchmark/text-generation-inference/performance#compiling-the-model

github-actions · 2024-12-06T08:05:09Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

dacorvo · 2024-12-09T09:36:03Z

Should be fixed by #743

SteliosGian added the bug Something isn't working label Oct 30, 2024

github-actions bot added the Stale label Dec 6, 2024

dacorvo closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not find a matching NEFF for your HLO in this directory #730

Could not find a matching NEFF for your HLO in this directory #730

SteliosGian commented Oct 30, 2024 •

edited

Loading

jimburtoft commented Nov 5, 2024

github-actions bot commented Dec 6, 2024

dacorvo commented Dec 9, 2024

Could not find a matching NEFF for your HLO in this directory #730

Could not find a matching NEFF for your HLO in this directory #730

Comments

SteliosGian commented Oct 30, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

jimburtoft commented Nov 5, 2024

github-actions bot commented Dec 6, 2024

dacorvo commented Dec 9, 2024

SteliosGian commented Oct 30, 2024 •

edited

Loading