Replies: 1 comment
-
Hey there, @arbitropy! I'm here to assist you with any bugs, questions, or contributions while you wait for a human maintainer. Let's tackle this issue together! The issue with In the Here is the relevant part of the code that sets up the stopping criteria: class StopOnTokens(StoppingCriteria):
def __call__(
self,
input_ids: torch.LongTensor,
scores: torch.FloatTensor,
**kwargs: Any,
) -> bool:
for stop_id in stopping_ids_list:
if input_ids[0][-1] == stop_id:
return True
return False
self._stopping_criteria = StoppingCriteriaList([StopOnTokens()]) In contrast, the @llm_chat_callback()
def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
prompt = self.messages_to_prompt(messages)
completion_response = self.complete(prompt, formatted=True, **kwargs)
return completion_response_to_chat_response(completion_response) To resolve this issue, ensure that the stopping criteria are correctly set in the For more details, you can refer to the implementation in the |
Beta Was this translation helpful? Give feedback.
-
I have been using this code to create an llm instance:
Then when i use llm.generate(), the generation never stops. but if i use llm.chat() with list of ChatMesasge(), it stops appropriately.
Due to this, when I use various metadata generation classes like SummaryExtractor of QuestionsAnsweredExtractor, the metadata are repetitive generation until max_token is reached.
Other models with HuggingfaceLLM don't have this problem.
Beta Was this translation helpful? Give feedback.
All reactions