diff --git a/README.md b/README.md index 3aa2156..96ca287 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,13 @@ # tokenizers C++ implementations for various tokenizers (sentencepiece, tiktoken etc). Useful for other PyTorch repos such as torchchat, ExecuTorch to build LLM runners using ExecuTorch stack or AOT Inductor stack. +## Installation (from source) +``` +git clone git@github.com:meta-pytorch/tokenizers.git +cd ~/tokenizers +git submodule update --init --recursive +pip install -e . +``` ## SentencePiece tokenizer Depend on https://github.com/google/sentencepiece from Google.