Multi QnA bot using Langchain

This project allows users to chat with multiple PDF documents by leveraging Langchain, Google Generative AI, and FAISS for building an efficient question-answering (QnA) bot. Users can upload PDFs, process them into chunks, and ask questions, while the chatbot searches for relevant answers within the PDFs and responds using a generative AI model. The bot’s behavior, including response word limit and creativity (temperature), can be customized via user input.

Features

Upload multiple PDF documents.
Split the documents into smaller chunks for processing.
Use FAISS to store and retrieve vector embeddings for fast similarity search.
Interact with Google Generative AI (Gemini) for generating human-like responses.
Adjust response creativity using a temperature setting.
Control the response word limit as per your requirements.
Easy-to-use interface with Streamlit.

Seup and Installation

To get started with the project, follow the steps below:

1.Clone the repository:

 git clone https://github.com/ashzad123/Multi-Docs-QnA-Bot.git

2.Navigate to the project directory:

Multi-Docs-QnA-Bot

3.Create a virtual environment and activate it.

python -m venv <environment name>
source <environment name>/bin/activate (for Linux)
<environment name>\Script\activate (for Windows)

4.Install the requirements:

pip install -r requirements.txt

Set up your Google Generative AI API key by creating a .env file:

touch .env

Inside the .env file:

GOOGLE_API_KEY=your_google_api_key_here

Run the streamlit app:

streamlit run app.py

How It Works

Function Overview

Here is an overview of the functions in the project and how they work:

get_pdf_text(pdf_docs)
This function extracts the text from uploaded PDF files. It uses PyPDF2 to read and extract the text from each page of the PDF.

def get_pdf_text(pdf_docs):
    text = ""
    for pdf in pdf_docs:
        pdf_reader = PdfReader(pdf)
        for page in pdf_reader.pages:
            text += page.extract_text()
    return text

get_text_chunks(text)
This function splits the extracted text into smaller chunks to handle large text inputs. It uses the RecursiveCharacterTextSplitter from langchain_text_splitters to break the text into manageable pieces.

def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    chunks = text_splitter.split_text(text)
    return chunks

get_vector_store(text_chunks)
This function creates a FAISS vector store from the text chunks using Google Generative AI embeddings. It helps with fast similarity searches when the user asks a question.

def get_vector_store(text_chunks):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
    vector_store.save_local("faiss_index")

get_conversational_chain(temperature, word_limit)
This function defines the conversational chain using Langchain’s PromptTemplate and the Google Generative AI model. It accepts the user-defined temperature for creativity and word limit for responses.

def get_conversational_chain(temperature, word_limit):
    prompt_template = f"""
    Answer the question as detailed as possible from the provided context. 
    If the answer is not in the provided context, say: "Answer is not available in the context." 
    Do not provide an incorrect answer. Limit your response to a maximum of {word_limit} words.

    Context:
    {{context}}
    
    Question:
    {{question}}

    Answer:
    """
    model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=temperature)
    prompt = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )
    chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)
    return chain

user_input(user_question, temperature, word_limit)
This function handles the user input (question) and generates a response. It first loads the FAISS index, retrieves relevant documents, and uses the conversation chain to generate a response based on the user’s question, temperature, and word limit settings.

def user_input(user_question, temperature, word_limit):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    new_db = FAISS.load_local(
        "faiss_index", embeddings, allow_dangerous_deserialization=True
    )
    docs = new_db.similarity_search(user_question)

    chain = get_conversational_chain(temperature, word_limit)

    response = chain(
        {"input_documents": docs, "question": user_question}, return_only_outputs=True
    )

    st.write("Reply:", response["output_text"])

Streamlit UI Components

Temperature and Word Limit

Temperature Input: The temperature input controls the creativity of the AI model. A lower value makes the model more deterministic, while a higher value makes it more creative. python

 st.session_state["temperature"] = st.number_input(
    "Choose the temperature for the model (affects creativity)", 0.0, 1.0, 0.3 )

Word Limit Input: Allows the user to control how long the response should be based on the given word limit between 100 to 5000.

 chunk_size = st.number_input("Set chunk size (number of characters):", min_value=100, max_value=5000, value=1000, step=100)

PDF Upload

Users can upload their PDF files via the Streamlit file uploader, and the app will process them into chunks for the QnA system.

pdf_docs = st.file_uploader(
    "Upload your PDF files and Click on the Submit & Process Button",
    accept_multiple_files=True,
)

Processing and Response

Once the PDFs are uploaded and processed, users can ask questions using a text input box, and the model will respond with an answer based on the PDF contents.

Recent Updates

Added Sentence Transformers

Integrated sentence-transformers as an alternative embedding provider.
This allows users to switch from Google Generative AI embeddings to Sentence Transformers for generating embeddings locally.

Updated Installation Instructions

Ensure sentence-transformers is installed:
```
pip install sentence-transformers
```

Updated `get_vector_store` Function

The get_vector_store function now supports Sentence Transformers as a fallback embedding provider. This ensures the app can work without relying solely on external APIs.

How to Use Sentence Transformers

To use Sentence Transformers, modify the get_vector_store function in app.py to:

from sentence_transformers import SentenceTransformer

def get_vector_store(text_chunks):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    embeddings = [model.encode(chunk) for chunk in text_chunks]
    vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
    vector_store.save_local("faiss_index")

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
faiss_index		faiss_index
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi QnA bot using Langchain

Features

Seup and Installation

1.Clone the repository:

2.Navigate to the project directory:

3.Create a virtual environment and activate it.

4.Install the requirements:

Set up your Google Generative AI API key by creating a .env file:

Run the streamlit app:

How It Works

Function Overview

Streamlit UI Components

Temperature and Word Limit

PDF Upload

Processing and Response

Recent Updates

Added Sentence Transformers

Updated Installation Instructions

Updated `get_vector_store` Function

How to Use Sentence Transformers

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ashzad123/Multi-Docs-QnA-Bot

Folders and files

Latest commit

History

Repository files navigation

Multi QnA bot using Langchain

Features

Seup and Installation

1.Clone the repository:

2.Navigate to the project directory:

3.Create a virtual environment and activate it.

4.Install the requirements:

Set up your Google Generative AI API key by creating a .env file:

Run the streamlit app:

How It Works

Function Overview

Streamlit UI Components

Temperature and Word Limit

PDF Upload

Processing and Response

Recent Updates

Added Sentence Transformers

Updated Installation Instructions

Updated get_vector_store Function

How to Use Sentence Transformers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Updated `get_vector_store` Function

Packages