GPT-NeoX

Introduction

This document describes the steps to run the GPT-NeoX model on FasterTransformer. GPT-NeoX is a model developed by EleutherAI, available publicly on their GitHub repository. For the time being, only the 20B parameter version has been tested.

More details are listed in gptj_guide.md.

Optimization in gpt-neox are similar to optimization in GPT, describing in the gpt_guide.md.

Note

is_context_qk_buf_float_ (whether use float accumulation for GPT-Neox context QK GEMM or not) is set to false by default. If you meet accuracy issues releated to GPT-NeoX Context attention blocks, please try to enable it in the GptNeoX.h.

Supported features

Checkpoint converter
- EleutherAI
Data type
- FP32
- FP16
Feature
- Multi-GPU multi-node inference
- Dynamic random seed
- Stop tokens
- Bad words list
- Beam search and sampling are both supported

Setup

Requirements

See common requirements such as in gptj_guide.md.

Download the model

First download a pytorch checkpoint, as provided by EleutherAI:

wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/slim_weights/ -P 20B_checkpoints

Then use the script provided by FasterTransformer to convert the checkpoint to raw weights, understood by FT.

python ../examples/pytorch/gptneox/utils/eleutherai_gpt_neox_convert.py 20B_checkpoints ../models/gptneox -t 2

Tokenizer

You may download the tokenizer config here.

To tokenize/detokenize files, use the script found in examples/pytorch/gptneox/utils/hftokenizer.py. You may need to pass the path to the tokenizer config with the --tokenizer flag.

Run GPT-NeoX

Generate the gemm_config.in file.
Data Type = 0 (FP32) or 1 (FP16) or 2 (BF16)

./bin/gpt_gemm <batch_size> <beam_width> <max_input_len> <head_number> <size_per_head> <inter_size> <vocab_size> <data_type> <tensor_para_size>
E.g., ./bin/gpt_gemm 8 1 32 64 96 24576 50432 1 2

Run GPT on C++

Users can see the details of arguments in examples/cpp/gptneox/gptneox_config.ini. It controls the model path, model size, tensor parallelism size, and some hyper-parameters.
```
mpirun -n 2 --allow-run-as-root ./bin/gptneox_example
```

E.g. by setting the data_type of gptneox_config.ini to fp16, users can run gpt model under fp16.

You can then decode the out file with the tokenizer:

wget https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/slim_weights/20B_tokenizer.json
../examples/pytorch/gptneox/utils/hftokenizer.py out --tokenizer 20B_tokenizer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gptneox_guide.md

gptneox_guide.md

GPT-NeoX

Table Of Contents

Introduction

Note

Supported features

Setup

Requirements

Download the model

Tokenizer

Run GPT-NeoX

Files

gptneox_guide.md

Latest commit

History

gptneox_guide.md

File metadata and controls

GPT-NeoX

Table Of Contents

Introduction

Note

Supported features

Setup

Requirements

Download the model

Tokenizer

Run GPT-NeoX