This repository contains a chatbot demonstration built using the Llama 2
model and the LangChain
framework, implemented within a Jupyter Notebook. This demonstration shows how to set up a Llama 2 chatbot in about 100 lines of code.
This chatbot utilizes the meta-llama/Llama-2-7b-chat-hf
model for conversational purposes. By accessing and running cells within chatbot.ipynb
on Google Colab, users can initialize and interact with the chatbot in real-time. This simple demonstration is designed to provide an effective and concise example of leveraging the power of the Llama 2 model for chatbot applications.
-
Clone this repository:
git clone https://github.com/francisdaigle/llama-2-lang-chain-chatbot.git
-
Navigate to Google Colab.
-
Click on
File > Upload notebook
. -
Choose the
chatbot.ipynb
file from the cloned repository on your local machine.
Before you can fully utilize the chatbot, you'll need a Hugging Face access token. This token allows you to access certain models from the Hugging Face model hub. Here's how to obtain it:
- Create an account or sign in to Hugging Face.
- Navigate to your profile settings.
- Under the Access Tokens section, you'll find your token. If you don't see a token, you can generate a new one.
- Copy the token and replace the placeholder
HF_ACCESS_TOKEN
in the.env_template
. - Rename
.env_template
to.env
.
Note: If you're looking to keep things simple, you can add your token directly to the notebook by replacing os.getenv('HF_ACCESS_TOKEN')
with your HF access token. However, always remember to keep your access tokens confidential. Never share your notebook with the token visible, as this poses a security risk.
-
Before running the cells, ensure the Colab runtime is set to use a GPU (click on
Runtime > Change runtime type
and select a GPU). -
Run the cells in sequence to install necessary dependencies, initialize, and interact with the chatbot (click on
Runtime > Run all
).
- Uses the
Llama 2
model for advanced conversational capabilities. - Incorporates 4-bit quantization to speed up inference and reduce GPU RAM requirements.
- Step-by-step implementation and interaction guide within the Google Colab Notebook.
- Concise demonstration with less than 100 lines of code.