-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: TypeError: Query column vector must be a vector. Got list<item: double>. #1335
Comments
Did you solve that? |
I did not solve that. |
I got same issue ,this is a tiny code, This bug is really weird. import numpy as np
import json
import pyarrow as pa
import lancedb
class TextEmbeder:
def __init__(self) -> None:
pass
def encode(self, x):
return np.abs(np.around(np.random.randn(3), 3)).tolist()
textembeder = TextEmbeder()
textembeder.encode("a")[:3]
def build_fake_data():
res = []
for index in range(1000):
id_ = str(index)
text = f"hello test {index}"
vector = textembeder.encode(text)
extra_data = json.dumps(
{"attr1": index, "attr2": index * 2, "attr3": "B站"}, ensure_ascii=True
)
res.append(
{"id": id_, "text": text, "extra_data": extra_data, "vector": vector}
)
return res
fake_data_list = build_fake_data()
fake_data_list[0].keys()
db_url = "data/database"
db_table_name = "smalltest"
db_connection = lancedb.connect(db_url)
db_connection.table_names()
schema = pa.schema(
[
pa.field("vector", pa.list_(pa.float64())),
pa.field("id", pa.string()),
pa.field("text", pa.string()),
pa.field("extra_data", pa.string()),
]
)
db_connection.create_table(name=db_table_name, schema=schema, mode="overwrite")
table = db_connection.open_table(db_table_name)
table.add(fake_data_list)
query_vector = textembeder.encode("hh")
query_vector[:4]
docs = (
table.search(query=query_vector, vector_column_name="vector", query_type="vector")
.limit(4)
.to_list()
)
print(docs) env:
|
I had fix this issue,in schema = pa.schema(
[
pa.field("vector", pa.list_(pa.float32(), list_size=DIM_VALUE)),
pa.field("id", pa.string()),
pa.field("text", pa.string()),
pa.field("extra_data", pa.string()),
]
) |
Congratulations! However, there is no “schema” in my code:
|
I meet this problem too, do anyone have any suggestions? |
i had the same issue, and my problem is solved by adjusting the command. Then i use "python -m graphrag.index --root ./ragtest" and "python -m graphrag.query --root ./ragtest --method local "explain the relationship between Jay and May.", it solves my problem. To sum up, in my particular case, i didn't include "pyhton -m" in my execution command, and it turns out problematic. |
TLDR: Don't use scripts in notebook examples since they are outdated. You can just call graphrag.api.global_search() directly. See examples in graphrag/cli/query.py. This is due to backward compatibility things related to lancedb. There're some code patching lancedb files. You'll need to handle this kind of things if you decide to call lower level APIs like read_indexer_entities() |
When I used the local_search query example shown at "https://microsoft.github.io/graphrag/examples_notebooks/local_search/", I encountered the same error as @zel2023. Here is my code: import os
import asyncio
import argparse
import tiktoken
from transformers import AutoTokenizer
import pandas as pd
from dotenv import load_dotenv
load_dotenv()
from graphrag.query.indexer_adapters import (
read_indexer_covariates,
read_indexer_entities,
read_indexer_relationships,
read_indexer_reports,
read_indexer_text_units,
)
from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.input.loaders.dfs import store_entity_semantic_embeddings
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.vector_stores.lancedb import LanceDBVectorStore
# local
from graphrag.query.structured_search.local_search.mixed_context import LocalSearchMixedContext
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.structured_search.local_search.search import LocalSearch
# global
from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
from graphrag.query.structured_search.global_search.search import GlobalSearch
INPUT_DIR = "./ragtest/output/"
LANCEDB_URI = f"{INPUT_DIR}/lancedb"
COMMUNITY_REPORT_TABLE = "create_final_community_reports"
ENTITY_TABLE = "create_final_nodes"
ENTITY_EMBEDDING_TABLE = "create_final_entities"
RELATIONSHIP_TABLE = "create_final_relationships"
COVARIATE_TABLE = "create_final_covariates"
TEXT_UNIT_TABLE = "create_final_text_units"
COMMUNITY_LEVEL = 2
HOME = os.getenv("HOME")
ENCODER_MODEL_PATH = f"{HOME}/Models/stella_en_1.5B_v5"
# read nodes table to get community and degree data
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
# print("entity_df.head():")
# print(entity_df.head())
entity_embedding_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet")
# print("entity_embedding_df.head():")
# print(entity_embedding_df.head())
entities = read_indexer_entities(entity_df, entity_embedding_df, COMMUNITY_LEVEL)
# load description embeddings to an in-memory lancedb vectorstore to connect to a remote db, specify url and port values.
description_embedding_store = LanceDBVectorStore(collection_name="entity.description")
description_embedding_store.connect(db_uri=LANCEDB_URI)
entity_description_embeddings = store_entity_semantic_embeddings(entities=entities, vectorstore=description_embedding_store)
# print(f"Entity count: {len(entity_df)}")
# print("entity_description_embeddings.head():")
# print(entity_description_embeddings.head())
relationship_df = pd.read_parquet(f"{INPUT_DIR}/{RELATIONSHIP_TABLE}.parquet")
relationships = read_indexer_relationships(relationship_df)
# print(f"Relationship count: {len(relationship_df)}")
# print("relationship_df.head():")
# print(relationship_df.head())
report_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)
# pd.set_option('display.max_colwidth', None)
# pd.set_option('display.max_colwidth', 50)
print(f"Report records: {len(report_df)}")
# print("report_df.head(1):")
# print(report_df.head(1))
text_unit_df = pd.read_parquet(f"{INPUT_DIR}/{TEXT_UNIT_TABLE}.parquet")
text_units = read_indexer_text_units(text_unit_df)
# print(f"Text unit records: {len(text_unit_df)}")
# print("text_unit_df.head():")
# print(text_unit_df.head())
api_key = os.environ["GRAPHRAG_API_KEY"]
api_base = os.environ["GRAPHRAG_API_BASE"]
llm_model = os.environ["GRAPHRAG_LLM_MODEL"]
llm = ChatOpenAI(
api_key=api_key,
api_base=api_base,
model=llm_model,
api_type=OpenaiApiType.OpenAI, # OpenaiApiType.OpenAI or OpenaiApiType.AzureOpenAI
max_retries=20,
)
embedding_model = os.environ["GRAPHRAG_EMBEDDING_MODEL"]
embedding_api_base = os.environ["GRAPHRAG_EMBEDDING_API_BASE"]
# token_encoder = tiktoken.get_encoding("cl100k_base")
token_encoder = AutoTokenizer.from_pretrained(ENCODER_MODEL_PATH)
# print("embedding_model:", embedding_model)
# print("embedding_api_base:", embedding_api_base)
def local_search():
text_embedder = OpenAIEmbedding(
api_key=api_key,
api_base=embedding_api_base,
api_type=OpenaiApiType.OpenAI,
model=embedding_model,
deployment_name=embedding_model,
max_retries=20,
)
context_builder = LocalSearchMixedContext(
community_reports=reports,
text_units=text_units,
entities=entities,
relationships=relationships,
# if you did not run covariates during indexing, set this to None
# covariates=covariates,
entity_text_embeddings=description_embedding_store,
embedding_vectorstore_key=EntityVectorStoreKey.ID, # if the vectorstore uses entity title as ids, set this to EntityVectorStoreKey.TITLE
text_embedder=text_embedder,
token_encoder=token_encoder,
)
# text_unit_prop: proportion of context window dedicated to related text units
# community_prop: proportion of context window dedicated to community reports.
# The remaining proportion is dedicated to entities and relationships. Sum of text_unit_prop and community_prop should be <= 1
# conversation_history_max_turns: maximum number of turns to include in the conversation history.
# conversation_history_user_turns_only: if True, only include user queries in the conversation history.
# top_k_mapped_entities: number of related entities to retrieve from the entity description embedding store.
# top_k_relationships: control the number of out-of-network relationships to pull into the context window.
# include_entity_rank: if True, include the entity rank in the entity table in the context window. Default entity rank = node degree.
# include_relationship_weight: if True, include the relationship weight in the context window.
# include_community_rank: if True, include the community rank in the context window.
# return_candidate_context: if True, return a set of dataframes containing all candidate entity/relationship/covariate records that
# could be relevant. Note that not all of these records will be included in the context window. The "in_context" column in these
# dataframes indicates whether the record is included in the context window.
# max_tokens: maximum number of tokens to use for the context window.
local_context_params = {
"text_unit_prop": 0.5,
"community_prop": 0.1,
"conversation_history_max_turns": 5,
"conversation_history_user_turns_only": True,
"top_k_mapped_entities": 10,
"top_k_relationships": 10,
"include_entity_rank": True,
"include_relationship_weight": True,
"include_community_rank": False,
"return_candidate_context": False,
"embedding_vectorstore_key": EntityVectorStoreKey.ID, # set this to EntityVectorStoreKey.TITLE if the vectorstore uses entity title as ids
"max_tokens": 12_000, # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
}
llm_params = {
"max_tokens": 2_000, # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 1000=1500)
"temperature": 0.0,
}
search_engine = LocalSearch(
llm=llm,
context_builder=context_builder,
token_encoder=token_encoder,
llm_params=llm_params,
context_builder_params=local_context_params,
response_type="multiple paragraphs", # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
)
async def local_async_search_engine():
result = await search_engine.asearch("Tell me about Agent Mercer")
# print(result)
return result
result = asyncio.run(local_async_search_engine())
print(result.response)
print(result.context_data["entities"].head())
print(result.context_data["relationships"].head())
print(result.context_data["reports"].head())
print(result.context_data["sources"].head())
if "claims" in result.context_data:
print(result.context_data["claims"].head())
# question generation
question_generator = LocalQuestionGen(
llm=llm,
context_builder=context_builder,
token_encoder=token_encoder,
llm_params=llm_params,
context_builder_params=local_context_params,
)
question_history = [
"Tell me about Agent Mercer",
"What happens in Dulce military base?",
]
async def local_async_question_generation():
result = await question_generator.agenerate(question_history=question_history, context_data=None, question_count=5)
# print(result)
return result
candidate_questions = asyncio.run(local_async_question_generation())
print(candidate_questions.response)
local_search()
# For global
def global_search():
context_builder = GlobalCommunityContext(
community_reports=reports,
entities=entities, # default to None if you don't want to use community weights for ranking
token_encoder=token_encoder,
)
context_builder_params = {
"use_community_summary": False, # False means using full community reports. True means using community short summaries.
"shuffle_data": True,
"include_community_rank": True,
"min_community_rank": 0,
"community_rank_name": "rank",
"include_community_weight": True,
"community_weight_name": "occurrence weight",
"normalize_community_weight": True,
"max_tokens": 12_000, # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
"context_name": "Reports",
}
map_llm_params = {
"max_tokens": 1000,
"temperature": 0.0,
"response_format": {"type": "json_object"},
}
reduce_llm_params = {
"max_tokens": 2000, # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 1000-1500)
"temperature": 0.0,
}
search_engine = GlobalSearch(
llm=llm,
context_builder=context_builder,
token_encoder=token_encoder,
max_data_tokens=12_000, # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
map_llm_params=map_llm_params,
reduce_llm_params=reduce_llm_params,
allow_general_knowledge=False, # set this to True will add instruction to encourage the LLM to incorporate general knowledge in the response, which may increase hallucinations, but could be useful in some use cases.
json_mode=True, # set this to False if your LLM model does not support JSON mode.
context_builder_params=context_builder_params,
concurrent_coroutines=32,
response_type="multiple paragraphs", # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
)
async def global_async_search_engine():
result = await search_engine.asearch(
"What is the major conflict in this story and who are the protagonist and antagonist?"
)
# print(result)
return result
result = asyncio.run(global_async_search_engine())
print(result.response)
# inspect the data used to build the context for the LLM responses
print(result.context_data["reports"])
# inspect number of LLM calls and tokens
print(f"LLM calls: {result.llm_calls}. LLM tokens: {result.prompt_tokens}" Here is the error log: Report records: 79
Traceback (most recent call last):
File "/home/xxx/Source/cyowcopy/existing_graphrags/query_gen.py", line 193, in <module>
local_search()
File "/home/xxx/Source/cyowcopy/existing_graphrags/query_gen.py", line 165, in local_search
result = asyncio.run(local_async_search_engine())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/xxx/Source/cyowcopy/existing_graphrags/query_gen.py", line 162, in local_async_search_engine
result = await search_engine.asearch("Tell me about Agent Mercer")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/search.py", line 67, in asearch
context_text, context_records = self.context_builder.build_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/mixed_context.py", line 140, in build_context
selected_entities = map_query_to_entities(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 57, in map_query_to_entities
search_results = text_embedding_vectorstore.similarity_search_by_text(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/graphrag/vector_stores/lancedb.py", line 136, in similarity_search_by_text
return self.similarity_search_by_vector(query_embedding, k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/graphrag/vector_stores/lancedb.py", line 115, in similarity_search_by_vector
.to_list()
^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/lancedb/query.py", line 320, in to_list
return self.to_arrow().to_pylist()
^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/lancedb/query.py", line 648, in to_arrow
return self.to_batches().read_all()
^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/lancedb/query.py", line 680, in to_batches
result_set = self._table._execute_query(query, batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/lancedb/table.py", line 1742, in _execute_query
return ds.scanner(
^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/lance/dataset.py", line 369, in scanner
builder = builder.nearest(**nearest)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/lance/dataset.py", line 2449, in nearest
raise TypeError(
TypeError: Query column vector must be a vector. Got list<item: double>. The command I used to generate the index is When i use graphrag query \
--root ./ragtest \
--method global \
--query "What are the top themes in this story?" I can get right result: graphrag query \
--root ./ragtest \
--method global \
--query "What are the top themes in this story?"
/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/paramiko/pkey.py:100: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
"cipher": algorithms.TripleDES,
/home/xxx/anaconda3/envs/CEPE/lib/python3.12/site-packages/paramiko/transport.py:259: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
"class": algorithms.TripleDES,
creating llm client with {'api_key': 'REDACTED,len=13', 'type': "openai_chat", 'model': 'qwen2a5-72b-instruct', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://1.11.11.11:11/', 'api_version': None, 'organization': None, 'proxy': None, 'audience': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Global Search Response:
# Top Themes in "A Christmas Carol"
## Transformation and Redemption
The central theme of "A Christmas Carol" is the transformation of Ebenezer Scrooge from a miserly, cold-hearted businessman to a generous and compassionate individual. This transformation is driven by the ghostly visitations of Jacob Marley and the spirits of Christmas Past, Present, and Yet to Come. These spirits show Scrooge the consequences of his actions and the importance of compassion and generosity, leading to a profound change in his character and behavior [Data: Reports (77, 28, 66, 22, 13, 34, 45, 75, 53, +more)].
## The Importance of Family and Community
The story emphasizes the value of family and community, particularly through the Cratchit family's strong bonds and their ability to find joy and hope despite their poverty. The community's support and the spirit of togetherness are highlighted in various scenes, such as the Cratchit family's Christmas celebration and the lighthouse keepers' camaraderie. Characters such as Scrooge's nephew Fred and the Spirit of Christmas Present also emphasize the importance of family, social connections, and the need to help those in need [Data: Reports (77, 75, 53, 61, 70, 26, 39, 42, 58, 80, 31, 49, 83, +more)].
## The Spirit of Christmas
The story celebrates the spirit of Christmas, characterized by generosity, kindness, and the joy of giving. The various Christmas celebrations, such as the Cratchit family's meal, the lighthouse keepers' toast, and Fezziwig's party, underscore the festive and communal aspects of the holiday. The theme of the Christmas spirit is a recurring element, highlighting the importance of these values during the holiday season [Data: Reports (77, 66, 75, 53, 45, 52, 2, 54, 50, +more)].
## The Power of Memory and Reflection
The spirits of Christmas Past, Present, and Yet to Come take Scrooge on a journey through his memories, the present, and the future. This journey allows Scrooge to reflect on his life and the consequences of his actions, leading to his transformation. The power of memory and reflection is a significant theme, emphasizing the influence of past experiences on one's present and future actions [Data: Reports (61, 70, 26, 31, 49, 83, 77, 21, 81, +more)].
## The Contrast Between Wealth and Poverty
The story highlights the stark contrast between Scrooge's wealth and the poverty of the Cratchit family. This contrast is used to illustrate the social and economic disparities of the time and the moral implications of Scrooge's miserly behavior. The theme of wealth and poverty serves to emphasize the importance of empathy and the need to help those less fortunate [Data: Reports (61, 70, 39, 58, 11, 18, 78, 51, 62, 30, 8, 57, 25, 47, 24, 40, 27, +more)].
## The Role of the Supernatural
The supernatural elements, such as the ghosts of Christmas Past, Present, and Yet to Come, play a crucial role in Scrooge's transformation. These ghostly visitations serve as catalysts for Scrooge's moral and spiritual awakening, highlighting the power of supernatural intervention in bringing about change. The blending of supernatural elements with the mundane aspects of daily life adds to the atmospheric richness and the transformative nature of Scrooge's journey [Data: Reports (18, 78, 51, 62, 30, 8, 57, 25, 47, 24, 40, 27, 52, 69, 65, 33, +more)].
## Generosity and Charity
The theme of generosity and charity is evident through the actions of characters like the Spirit of Christmas Present, Scrooge's nephew, and the portly gentlemen. These characters demonstrate the importance of giving and helping others, especially during the Christmas season. Scrooge's transformation is marked by his newfound generosity, such as his decision to give a large turkey to the Cratchit family and his willingness to help the poor [Data: Reports (32, 72, 63, 60, 61, 70, 39, 42, 58, 11, +more)].
## The Consequences of One's Actions
The story explores the consequences of Scrooge's past and present actions, as revealed through the spirits' visions. The potential future, including the tragic fate of Tiny Tim, serves as a powerful motivator for Scrooge to change his ways and become a more compassionate and generous person. This theme underscores the moral and ethical dimensions of Scrooge's character and the potential for redemption [Data: Reports (77, 28, 13, 34, 61, 70, 39, 58, 11, +more)].
These themes collectively contribute to the rich and multifaceted narrative of "A Christmas Carol," making it a timeless and deeply resonant story. However, when i try to use: graphrag query \
--root ./ragtest \
--method local \
--query "Who is Scrooge and what are his main relationships?" > ./locol_query.log 2>&1 Errors still appear extensively: please refer to file [locol_query.log](https://github.com/user-attachments/files/17677864/locol_query.log) to see the error I tried to fix each error individually, but this led to a series of new errors. In summary, it seems that graphrag query \
--root ./ragtest \
--method local \
--query "Who is Scrooge and what are his main relationships?" > ./local_query.log 2>&1 |
I was able to find a workaround. I changed the schema to this: N = 3072
schema = pa.schema([
pa.field("id", pa.string()),
pa.field("text", pa.string()),
pa.field("vector", pa.list_(pa.float64(), N)),
pa.field("attributes", pa.string()),
]) where N is the length of embedding vector produced by whatever embedding model you are using in your settings.yml file. I am using "text-embedding-3-large" from openai hence the number is 3072. Location of the file to edit --
|
Do you need to file an issue?
Describe the bug
I attempted to refer to https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/local_search.ipynb to write a Python file for running local search but failed.
However, using https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/global_search.ipynb as a reference, I successfully wrote a Python file to run global search.
Additionally, I successfully ran the local search by referring to https://github.com/microsoft/graphrag/blob/94f1e62e5c06795fc8c361dba6580bb76d6e77ce/docs/get_started.md.
Below is the error message:
Entity count: 3
Relationship count: 2
Report records: 1
Text unit records: 1
Traceback (most recent call last):
File "/data/zelongzheng/graphrag-main/local_search.py", line 172, in
asyncio.run(main())
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/data/zelongzheng/graphrag-main/local_search.py", line 167, in main
result = await search_engine.asearch("what is the relationship between xiaozhang and xiaoming?")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/zelongzheng/graphrag-main/graphrag/query/structured_search/local_search/search.py", line 67, in asearch
context_text, context_records = self.context_builder.build_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/zelongzheng/graphrag-main/graphrag/query/structured_search/local_search/mixed_context.py", line 140, in build_context
selected_entities = map_query_to_entities(
^^^^^^^^^^^^^^^^^^^^^^
File "/data/zelongzheng/graphrag-main/graphrag/query/context_builder/entity_extraction.py", line 57, in map_query_to_entities
search_results = text_embedding_vectorstore.similarity_search_by_text(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/zelongzheng/graphrag-main/graphrag/vector_stores/lancedb.py", line 136, in similarity_search_by_text
return self.similarity_search_by_vector(query_embedding, k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/zelongzheng/graphrag-main/graphrag/vector_stores/lancedb.py", line 115, in similarity_search_by_vector
.to_list()
^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/site-packages/lancedb/query.py", line 320, in to_list
return self.to_arrow().to_pylist()
^^^^^^^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/site-packages/lancedb/query.py", line 647, in to_arrow
return self.to_batches().read_all()
^^^^^^^^^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/site-packages/lancedb/query.py", line 678, in to_batches
result_set = self._table._execute_query(query, batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/site-packages/lancedb/table.py", line 1742, in _execute_query
return ds.scanner(
^^^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/site-packages/lance/dataset.py", line 369, in scanner
builder = builder.nearest(**nearest)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zelongzheng/anaconda3/envs/graphrag/lib/python3.11/site-packages/lance/dataset.py", line 2449, in nearest
raise TypeError(
TypeError: Query column vector must be a vector. Got list<item: double>.
Steps to reproduce
1.pip install graphrag==0.3.6
2.Build the graph and use the command
python -m graphrag.query --root ./ragtest --method local "what is the relationship between xiaozhang and xiaoming?"
to confirm successful execution.3.Write a python file using https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/local_search.ipynb as a reference. Modify certain parts of this file: change INPUT_DIR, comment out all variables related to covariates since no related files were generated when building the graph, set API_KEY, llm_model (deepseek-chat), embedding_model (text-embedding-3-small), and api_base="https://api.agicto.cn/v1".
4.Run the file using Python.
Expected Behavior
I expect it will response for what I ask.
GraphRAG Config Used
Logs and screenshots
Additional Information
The text was updated successfully, but these errors were encountered: