RAG LLM Generating the entire Prompt along with Question in the generated response #3114

Arpx17 · 2024-02-29T14:30:32Z

Arpx17
Feb 29, 2024

I was trying to build a RAG LLM model using opensource models. but while generating the response the llm is attaching the entire prompt and relevant document at the output. can anyone please tell me how can I remove the prompt and the Question section and get only the Answer in response ?

langchain Version: 0.1.9
Code:

from langchain import HuggingFaceHub
llm = HuggingFaceHub(repo_id = "google/gemma-7b-it",
                     model_kwargs = dict(
                        max_new_tokens=256,
                        top_k=10,
                        top_p=0.95,
                        typical_p=0.95,
                        temperature=0.3,
                        repetition_penalty=1.03,
                        streaming=True,),
                    task='conversational',
                    huggingfacehub_api_token = tokens['HUGGINGFACEHUB_API_TOKEN'])

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("EM_Theory.pdf")
pages = loader.load_and_split()

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_chunks = text_splitter.split_documents(pages)



from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name = 'sentence-transformers/all-mpnet-base-v2')
vector_store = FAISS.from_documents(text_chunks, embedding=embeddings)


from langchain.prompts import PromptTemplate 
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
prompt_template = PromptTemplate(input_variables=['chat_history', 'question'], 
               template='''Given the following conversation and a follow up question,
               rephrase the follow up question to be a standalone question, 
               in its original language. Only generate the answer of the asked question. 
               Don't generate the contexts and questions in output
               \n\nChat History:\n{chat_history}\nFollow Up Input: {question}''')
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

chain = ConversationalRetrievalChain.from_llm(llm=llm, chain_type='stuff',condense_question_prompt = prompt_template,
                                                 retriever=vector_store.as_retriever(search_kwargs={"k": 2}),
                                                 memory=memory)
query = 'what is the Maxwell’s equation?'
history = []
result = chain({"question": query, "chat_history": history})
history.append((query, result["answer"]))   

print(result)

Output:


{'question': 'what is the Maxwell’s equation?',
 'chat_history': [HumanMessage(content='what is the Maxwell’s equation?'),
  AIMessage(content="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nLet’s play physics 9681634157  \n10 \n  \n \n \n \nWAVE EQUATION IN FREE SPACE  \n Write down Maxwell’s equation in free space. Obtain the wave equation for electric \nfield intensity from them.                                                                              CU 1010, 09 ,06, 01       \n                                                     OR \nShow that Maxwell’s equations suggest propagation of electromagnetic wave in a linear \nhomogeneous dielectric medium having no free charge.                    CU 2014  \n                                                     OR \n \nDerive the expression of speed of light from Maxwell’s equations.             CU 2015  \n \n Show that for a plane em wave in free space, the unit vector in the direction of \npropagation the electric and magnetic fields are mutually perpendi cular. 4                     \n                                                  OR                                                                              CU2011 ,06 , 05\n\nLet’s play physics 9681634157  \n16 \n                                                                      𝛻⃗ ×𝐻⃗⃗ =𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡 \n                     ∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=𝐻⃗⃗ ∙(𝛻⃗ ×𝐸⃗ )−𝐸⃗ ∙(𝛻⃗ ×𝐻⃗⃗ ) \n                                                                       \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙(𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡) \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙𝐽 −𝐸⃗ ∙𝜕𝐷⃗⃗ \n𝜕𝑡 \n                        For a linear medium 𝐷⃗⃗ =∈𝐸⃗  & 𝐵⃗ =𝜇𝐻⃗⃗   \n∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=−1\n2𝜕\n𝜕𝑡(𝐻⃗⃗ ∙𝐵⃗ )−1\n2𝜕\n𝜕𝑡(𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽                              \n=−𝜕\n𝜕𝑡(1\n2𝐻⃗⃗ ∙𝐵⃗ +1\n2𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽  \nIntegrating above equations over a volume  𝑉 bounded by closed surface 𝑆 and \napplying divergence theorem,  \n∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 =−𝑑\n𝑑𝑡∫1\n2(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉−∫𝐸⃗ ∙𝐽  \n𝑣 \n𝑣 𝑑𝑉 \n \n𝑂𝑟,∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 +∫𝐸⃗ ∙𝐽  \n𝑣 𝑑𝑉=−𝑑\n𝑑𝑡∫1\n2 \n𝑣(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉 \n It is the mathematical form of  Poynting’s theorem.  \nLet us now find a physical meaning of this equation.  \na. The rate of work done by E.M. force on an element charge 𝑑𝑞 (=𝜌 𝑑𝑉) is given  \nby,\n\nQuestion: what is the Maxwell’s equation?\nHelpful Answer: Maxwell's equations are a set of four differential equations that describe how electric and magnetic fields interact. They were first proposed by James Clerk Maxwell in the late 1800s and are fundamental to our understanding of classical electromagnetism. The equations relate the electric field (𝛻⃗), magnetic field (𝐸⃗), electric charge density (𝜌), and current density (𝜕𝐷⃗). In free space, these equations simplify to the wave equation for electric and magnetic fields. In a linear homogeneous dielectric medium, they suggest the propagation of electromagnetic waves with the speed of light. The equations also lead to Poynting's theorem, which relates the flow of electromagnetic energy through a volume.")],
 'answer': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nLet’s play physics 9681634157  \n10 \n  \n \n \n \nWAVE EQUATION IN FREE SPACE  \n Write down Maxwell’s equation in free space. Obtain the wave equation for electric \nfield intensity from them.                                                                              CU 1010, 09 ,06, 01       \n                                                     OR \nShow that Maxwell’s equations suggest propagation of electromagnetic wave in a linear \nhomogeneous dielectric medium having no free charge.                    CU 2014  \n                                                     OR \n \nDerive the expression of speed of light from Maxwell’s equations.             CU 2015  \n \n Show that for a plane em wave in free space, the unit vector in the direction of \npropagation the electric and magnetic fields are mutually perpendi cular. 4                     \n                                                  OR                                                                              CU2011 ,06 , 05\n\nLet’s play physics 9681634157  \n16 \n                                                                      𝛻⃗ ×𝐻⃗⃗ =𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡 \n                     ∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=𝐻⃗⃗ ∙(𝛻⃗ ×𝐸⃗ )−𝐸⃗ ∙(𝛻⃗ ×𝐻⃗⃗ ) \n                                                                       \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙(𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡) \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙𝐽 −𝐸⃗ ∙𝜕𝐷⃗⃗ \n𝜕𝑡 \n                        For a linear medium 𝐷⃗⃗ =∈𝐸⃗  & 𝐵⃗ =𝜇𝐻⃗⃗   \n∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=−1\n2𝜕\n𝜕𝑡(𝐻⃗⃗ ∙𝐵⃗ )−1\n2𝜕\n𝜕𝑡(𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽                              \n=−𝜕\n𝜕𝑡(1\n2𝐻⃗⃗ ∙𝐵⃗ +1\n2𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽  \nIntegrating above equations over a volume  𝑉 bounded by closed surface 𝑆 and \napplying divergence theorem,  \n∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 =−𝑑\n𝑑𝑡∫1\n2(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉−∫𝐸⃗ ∙𝐽  \n𝑣 \n𝑣 𝑑𝑉 \n \n𝑂𝑟,∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 +∫𝐸⃗ ∙𝐽  \n𝑣 𝑑𝑉=−𝑑\n𝑑𝑡∫1\n2 \n𝑣(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉 \n It is the mathematical form of  Poynting’s theorem.  \nLet us now find a physical meaning of this equation.  \na. The rate of work done by E.M. force on an element charge 𝑑𝑞 (=𝜌 𝑑𝑉) is given  \nby,\n
\nQuestion: what is the Maxwell’s equation?
\nHelpful Answer: Maxwell's equations are a set of four differential equations that describe how electric and magnetic fields interact. They were first proposed by James Clerk Maxwell in the late 1800s and are fundamental to our understanding of classical electromagnetism. The equations relate the electric field (𝛻⃗), magnetic field (𝐸⃗), electric charge density (𝜌), and current density (𝜕𝐷⃗). In free space, these equations simplify to the wave equation for electric and magnetic fields. In a linear homogeneous dielectric medium, they suggest the propagation of electromagnetic waves with the speed of light. The equations also lead to Poynting's theorem, which relates the flow of electromagnetic energy through a volume."}

I have also tried with mistralai/Mistral-7B-Instruct-v0.2 , NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO and mistralai/Mixtral-8x7B-Instruct-v0.1 . but got same kind of result.

Can anyone solve this issue ?

KepaJRodriguez · 2024-04-17T19:50:31Z

KepaJRodriguez
Apr 17, 2024

I have this problem right now after I updated LangChain. Earlier was just fine.
I have tested other models in HuggingFace (i.e. Gemma-2b-it, gemma-7b-it) with the same result, thus it doesn't seem to be related to the specific model, but to the update of LangChain.
I have downgraded to version 0.1.6 and now it works fine again.

0 replies

omsite45 · 2025-06-19T12:25:54Z

omsite45
Jun 19, 2025

I ran your code, Now it seems they have a bug in the HuggingFaceHub lib

---> 49 result = chain({"question": query, "chat_history": history})
50 history.append((query, result["answer"]))
51

24 frames
/usr/local/lib/python3.11/dist-packages/langchain_community/llms/huggingface_hub.py in _call(self, prompt, stop, run_manager, kwargs)
136 parameters = {_model_kwargs, **kwargs}
137
--> 138 response = self.client.post(
139 json={"inputs": prompt, "parameters": parameters}, task=self.task
140 )

AttributeError: 'InferenceClient' object has no attribute 'post'

CAN SOMEONE ASSIST TO SOLVE THIS ??

1 reply

JunjieAraoXiong Nov 25, 2025

Hi @omsite45,

This error indicates a version mismatch. The huggingface_hub library has updated its client structure, and the older HuggingFaceHub class in LangChain is trying to call a method (post) that no longer exists on the client.

You should upgrade to the newer langchain-huggingface package, which replaces the deprecated HuggingFaceHub class with HuggingFaceEndpoint.

pip install langchain-huggingface

Update your code to use the new class,

from langchain_huggingface import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="google/gemma-7b-it",
    task="text-generation",
    max_new_tokens=256,
    do_sample=True,
    repetition_penalty=1.03,
    return_full_text=False,  # Ensures the prompt is not repeated
    temperature=0.3,
)

The use Hugging Face's hosted API, not a local vLLM instance.

If your goal was to run the model locally using vLLM for faster inference, you should use the vLLM integration instead

from langchain_community.llms import VLLM

llm = VLLM(
    model="google/gemma-7b-it",
    max_new_tokens=256,
    top_k=10,
    top_p=0.95,
    temperature=0.3,
    # vLLM specific arguments...
)

JunjieAraoXiong · 2025-11-25T14:01:12Z

JunjieAraoXiong
Nov 25, 2025

Hi @Arpx17 ,

The issue where the response includes the entire prompt is default behavior for many Hugging Face Inference Endpoints (which HuggingFaceHub connects to) when using text-generation models.

To fix this, you need to explicitly tell the API not to return the full text by adding "return_full_text": False to your model_kwargs.

llm = HuggingFaceHub(
    repo_id="google/gemma-7b-it",
    model_kwargs={
        "max_new_tokens": 256,
        "return_full_text": False,  # <--- Add this line
        "top_k": 10,
        "top_p": 0.95,
        "temperature": 0.3,
        # ... other params
    },
    # ...
)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RAG LLM Generating the entire Prompt along with Question in the generated response #3114

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

RAG LLM Generating the entire Prompt along with Question in the generated response #3114

Uh oh!

Arpx17 Feb 29, 2024

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

KepaJRodriguez Apr 17, 2024

Uh oh!

omsite45 Jun 19, 2025

Uh oh!

JunjieAraoXiong Nov 25, 2025

Uh oh!

JunjieAraoXiong Nov 25, 2025

Arpx17
Feb 29, 2024

Replies: 3 comments 1 reply

KepaJRodriguez
Apr 17, 2024

omsite45
Jun 19, 2025

JunjieAraoXiong
Nov 25, 2025