-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
When querying documents, I get a keyError. Error Message:
File "/Users/harshitm/Developer/Kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 289, in run
text_doc = text_thumbnail_docs[thumbnail_doc.doc_id]
KeyError: 'fbe551c2-48ea-44d9-a60f-aab2fa10d89a'
User-id: e2d846534c594c4a95fd9705ca6f4c2b, can see public conversations: True
On trying to debug, this is what I found:
The ids in list thumbnail_doc_ids
does not match ids of retrieved docs linked_thumbnail_docs
.
This is the document I uploaded:
Reproduction steps
1. Set set the default LLM and Embeddings model as Cohere
1. Go to 'chat section'
2. Upload file
3. In file collection, select the uploaded file.
4. Query the document
5. See error message
Screenshots
<img width="1442" alt="Image" src="https://github.com/user-attachments/assets/899f75f3-1a42-42d9-be08-523eafb8d595" />
Logs
Session reasoning type None use mindmap True use citation highlight language en
Session LLM
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x30a860100>, FSPath=PosixPath('/Users/harshitm/Developer/Kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x30a8608e0>, get_extra_table=True, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x31877c160>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x31877cdc0>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x31877e020>), mmr=False, rerankers=[], retrieval_mode='hybrid', top_k=10, user_id='e2d846534c594c4a95fd9705ca6f4c2b'), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x104bfa9e0>, FSPath=<theflow.base.unset_ object at 0x104bfa9e0>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x104bfa9e0>, VS=<theflow.base.unset_ object at 0x104bfa9e0>, file_ids=[], user_id=<theflow.base.unset_ object at 0x104bfa9e0>), LightRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x104bfa9e0>, FSPath=<theflow.base.unset_ object at 0x104bfa9e0>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x104bfa9e0>, VS=<theflow.base.unset_ object at 0x104bfa9e0>, file_ids=[], search_type='local', user_id=<theflow.base.unset_ object at 0x104bfa9e0>)]
searching in doc_ids ['e5948fff-470c-40c5-8910-92e9cbaaedae']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters'])
Harshit thumbnail_count: 3
Number of requested results 100 is greater than number of elements in index 44, updating n_results = 44
Got 44 from vectorstore
Got 27 from docstore
Got raw 10 retrieved documents
Harshit thumbnail_doc_ids: {'a4ab8eb1-080b-4828-9924-ad0a55ecc0d4', '22c3486b-4a02-4eae-999c-cc1b2d67066f', '901db26a-69ee-4ae4-8416-b28d4c397725'}
Harshit linked_thumbnail_doc: 33332565-498c-4da5-94fa-cec84e38a411
Harshit linked_thumbnail_doc: ae6a1a3e-fba3-404e-aa72-85e54d00f1c3
Harshit linked_thumbnail_doc: 4ea898a7-0f50-4601-a929-c67d6ef495bb
thumbnail docs 3 non-thumbnail docs 7 raw-thumbnail docs 0
Traceback (most recent call last):
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
return await iterator.__anext__()
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
return await anyio.to_thread.run_sync(
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread
return await future
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 967, in run
result = context.run(func, *args)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/Users/harshitm/Developer/Kotaemon/libs/ktem/ktem/pages/chat/__init__.py", line 1321, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "/Users/harshitm/Developer/Kotaemon/libs/ktem/ktem/reasoning/simple.py", line 291, in stream
docs, infos = self.retrieve(message, history)
File "/Users/harshitm/Developer/Kotaemon/libs/ktem/ktem/reasoning/simple.py", line 132, in retrieve
retriever_docs = retriever_node(text=query)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/libs/ktem/ktem/index/file/pipelines.py", line 175, in run
docs = self.vector_retrieval(text=text, top_k=self.top_k, **retrieval_kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/install_dir/env/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/Users/harshitm/Developer/Kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 289, in run
text_doc = text_thumbnail_docs[thumbnail_doc.doc_id]
KeyError: '33332565-498c-4da5-94fa-cec84e38a411'
Browsers
No response
OS
No response
Additional information
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working