Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not find a matching NEFF for your HLO in this directory #730

Closed
4 tasks
SteliosGian opened this issue Oct 30, 2024 · 3 comments
Closed
4 tasks

Could not find a matching NEFF for your HLO in this directory #730

SteliosGian opened this issue Oct 30, 2024 · 3 comments
Labels
bug Something isn't working Stale

Comments

@SteliosGian
Copy link

SteliosGian commented Oct 30, 2024

System Info

I'm compiling a fine-tuned Llama 3.1 70B model with the below system info on an inf2.48xlarge machine. I'm using neuronX TGI 0.0.25 with AWS Sagemaker. I get the below error:

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/compiled/e28e32d0143dad6277a9.neff'
FileNotFoundError: Could not find a matching NEFF for your HLO in this directory. Ensure that the model you are trying to load is the same type and has the same parameters as the one you saved or call "save" on this model to reserialize it.

Platform:

- Platform: Linux-6.8.0-1015-aws-x86_64-with-glibc2.35
- Python version: 3.10.12


Python packages:

- `optimum-neuron` version: 0.0.25
- `neuron-sdk` version: 2.20.0
- `optimum` version: 1.22.0
- `transformers` version: 4.43.2
- `huggingface_hub` version: 0.26.2
- `torch` version: 2.1.2+cu121
- `aws-neuronx-runtime-discovery` version: 2.9
- `libneuronxla` version: 2.0.4115.0
- `neuronx-cc` version: 2.15.128.0+56dc5a86
- `neuronx-distributed` version: 0.8.0
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.1.2.2.3.0
- `torch-xla` version: 2.1.4
- `transformers-neuronx` version: 0.12.313


Neuron Driver:


WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

This is my compilation command:

optimum-cli export neuron -m orig-llama/ --batch_size 4 --task text-generation --sequence_length 4096 --num_cores 24 --auto_cast_type bf16 ./neuron-llama-throughput

Here is my TGI env:

sagemaker_model_env = {
    "SM_MODEL_DIR" = "/opt/ml/model"
    "HF_MODEL_ID" = "/opt/ml/model"
    "HF_NUM_CORES" = "24"
    "HF_BATCH_SIZE" = "4"
    "HF_SEQUENCE_LENGTH" = "4096"
    "HF_AUTO_CAST_TYPE" = "bf16"
    "MAX_BATCH_SIZE" = "4"
    "MAX_INPUT_TOKENS" = "3072"
    "MAX_TOTAL_TOKENS" = "4096"
    "MESSAGES_API_ENABLED" = "false"
    "MAX_BATCH_PREFILL_TOKENS" = "16384"
    "MAX_BATCH_TOTAL_TOKENS" = "20000"
    "ROPE_SCALING" = "dynamic"
    "ROPE_FACTOR" = "8.0"
  }

Who can help?

@dacorvo @JingyaHuang

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Compile Llama 3.1 70B using the below system info on inf2.48xlarge and run on neuronX TGI version 0.0.25

Platform:

  • Platform: Linux-6.8.0-1015-aws-x86_64-with-glibc2.35
  • Python version: 3.10.12

Python packages:

  • optimum-neuron version: 0.0.25
  • neuron-sdk version: 2.20.0
  • optimum version: 1.22.0
  • transformers version: 4.43.2
  • huggingface_hub version: 0.26.2
  • torch version: 2.1.2+cu121
  • aws-neuronx-runtime-discovery version: 2.9
  • libneuronxla version: 2.0.4115.0
  • neuronx-cc version: 2.15.128.0+56dc5a86
  • neuronx-distributed version: 0.8.0
  • neuronx-hwm version: NA
  • torch-neuronx version: 2.1.2.2.3.0
  • torch-xla version: 2.1.4
  • transformers-neuronx version: 0.12.313

Neuron Driver:

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

Expected behavior

The neuronX TGI server to start

@SteliosGian SteliosGian added the bug Something isn't working label Oct 30, 2024
@jimburtoft
Copy link
Contributor

I want to rule out an SDK mismatch between the compilation environment and the hosting environment.

Are you deploying on SageMaker? What image are you using?

If you are not deploying on SageMaker, try compiling using the TGI image itself.

See the example here:
https://github.com/huggingface/optimum-neuron/tree/main/benchmark/text-generation-inference/performance#compiling-the-model

Copy link

github-actions bot commented Dec 6, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Dec 6, 2024
@dacorvo
Copy link
Collaborator

dacorvo commented Dec 9, 2024

Should be fixed by #743

@dacorvo dacorvo closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants