BitNet-1.58-Instruct

Implementation of BitNet-1.58 instruct tuning. All data and models are versioned and stored on Oxen.ai at ox/BitNet. This work builds off the pre-trained models released in the 1bitLLM/bitnet_b1_58-large project on Hugging Face.

Code Name: Bessie the BitNet 🐂

Motivation

This is work done was originally done for the arXiv dive community and more info on BitNets can be found on our blog post.

We also have some internal use cases at Oxen.ai for a fast and local LLM. BitNet 1.58 seem like an interesting direction. We will open source our models, data, and code as we go.

Inference

There is a simple script to prompt given a system message. You can give it a base llm or fine tuned llm.

Run Base Model

python scripts/prompt.py -m 1bitLLM/bitnet_b1_58-large

Run Fine-Tuned Model

oxen download ox/BitNet models/bitnet_b1_58-large-instruct-100k
python scripts/prompt.py -m models/bitnet_b1_58-large-instruct-100k

Training

The training was done on an A10 with 24GB of VRAM. We cut off the max seq len to 768 because otherwise it runs out of VRAM on some batches. Would be nice to kick off a train on a larger GPU and larger context length.

python tools/train.py -d -m 1bitLLM/bitnet_b1_58-large -d train.jsonl -o results/bitnet_b1_58-large-instruct

Pre-Training

The models are trained with RedPajama dataset for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following paper.

NOTE: This repo does not perform the pre-training, just uses these models as a jumping off point for instruct tuning.

Instruct Tuning Data

The instruct tuning was done on a mix of data:

SQuADv2 with context and questions
mosaicml/instruct-v3

You can see the mix of data here:

https://www.oxen.ai/ox/BitNet/file/main/train.jsonl

oxen download ox/BitNet train.jsonl
oxen download ox/BitNet dev.jsonl

Data Format

The dataset should be jsonl with prompt and response fields for the SFT step.

head -n 1 train.jsonl | jq

{
  "prompt": "What is Oxen.ai?",
  "response": "Oxen.ai is a Open-source tools to track, iterate, collaborate on, and discover multi-modal data in any format.",
  "source": "manual"
}

System Prompt

The system prompt is currently hard coded into bitnet/prompts/assistant_prompt.py.

You are Bessie, created by Oxen.ai. You are happy to help with writing, analysis, question answering, math, coding, and all sorts of other tasks. You give concise responses to simple questions or statements, but provide thorough responses to more complex and open-ended questions. Answer the user's query as best as you can, and say "I don't know" if you don't know the answer.

TODO: Evaluation

For evaluation purposes, we are also using SQuAD dataset. The idea is the model should be able to answer generic questions as well as extract answers from questions and context if provided.

If the answer is not in the context, we want to be able to say "Not in context.".

python tools/eval.py -m results/bitnet_b1_58-large-instruct/final_checkpoint/ -d dev.jsonl -o eval.jsonl -n 100

The eval script outputs a dataframe like this:

TODO:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bitnet		bitnet
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BitNet-1.58-Instruct

Motivation

Inference

Run Base Model

Run Fine-Tuned Model

Training

Pre-Training

Instruct Tuning Data

Data Format

System Prompt

TODO: Evaluation

About

Releases

Packages

Languages

License

Oxen-AI/BitNet-1.58-Instruct

Folders and files

Latest commit

History

Repository files navigation

BitNet-1.58-Instruct

Motivation

Inference

Run Base Model

Run Fine-Tuned Model

Training

Pre-Training

Instruct Tuning Data

Data Format

System Prompt

TODO: Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages