Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: got an AttributeError: 'str' object has no attribute 'data' when documents length too long for VectorStoreIndex #17940

Open
1 task done
yx9966 opened this issue Feb 27, 2025 · 2 comments
Labels
question Further information is requested

Comments

@yx9966
Copy link

yx9966 commented Feb 27, 2025

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

The code is:
documents = SimpleDirectoryReader(input_files=code_files).load_data() splitter = CodeSplitter( language="python", chunk_lines=chunk_lines, chunk_lines_overlap=chunk_lines_overlap, max_chars=max_chars ) temp_documents = splitter.get_nodes_from_documents(documents) VectorStoreIndex(temp_documents, use_async=True)

When len(temp_documents) is larger than 600+, an error will come out:

Traceback (most recent call last): File "<string>", line 1, in <module> File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 76, in __init__ super().__init__( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 77, in __init__ index_struct = self.build_index_from_nodes( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 310, in build_index_from_nodes return self._build_index_from_nodes(content_nodes, **insert_kwargs) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 277, in _build_index_from_nodes run_async_tasks(tasks) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 77, in run_async_tasks outputs: List[Any] = asyncio_run(_gather()) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 33, in asyncio_run return loop.run_until_complete(coro) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/nest_asyncio.py", line 98, in run_until_complete return f.result() File "/root/anaconda3/envs/salt/lib/python3.10/asyncio/futures.py", line 201, in result raise self._exception.with_traceback(self._exception_tb) File "/root/anaconda3/envs/salt/lib/python3.10/asyncio/tasks.py", line 234, in __step result = coro.throw(exc) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 75, in _gather return await asyncio.gather(*tasks_to_execute) File "/root/anaconda3/envs/salt/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup future.result() File "/root/anaconda3/envs/salt/lib/python3.10/asyncio/tasks.py", line 234, in __step result = coro.throw(exc) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 189, in _async_add_nodes_to_index nodes_batch = await self._aget_node_with_embedding( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 163, in _aget_node_with_embedding id_to_embed_map = await async_embed_nodes( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/indices/utils.py", line 230, in async_embed_nodes new_embeddings = await embed_model.aget_text_embedding_batch( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper result = await func(*args, **kwargs) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/core/base/embeddings/base.py", line 407, in aget_text_embedding_batch nested_embeddings = await asyncio.gather(*embeddings_coroutines) File "/root/anaconda3/envs/salt/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup future.result() File "/root/anaconda3/envs/salt/lib/python3.10/asyncio/tasks.py", line 232, in __step result = coro.send(None) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/embeddings/openai/base.py", line 481, in _aget_text_embeddings return await _retryable_aget_embeddings() File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped return await copy(fn, *args, **kwargs) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__ do = await self.iter(retry_state=retry_state) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 153, in iter result = await action(retry_state) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner return call(*args, **kwargs) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/tenacity/__init__.py", line 398, in <lambda> self._add_action_func(lambda rs: rs.outcome.result()) File "/root/anaconda3/envs/salt/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/root/anaconda3/envs/salt/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__ result = await fn(*args, **kwargs) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/embeddings/openai/base.py", line 474, in _retryable_aget_embeddings return await aget_embeddings( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/llama_index/embeddings/openai/base.py", line 193, in aget_embeddings await aclient.embeddings.create(input=list_of_text, model=engine, **kwargs) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/openai/resources/embeddings.py", line 243, in create return await self._post( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/openai/_base_client.py", line 1856, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/openai/_base_client.py", line 1550, in request return await self._request( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/openai/_base_client.py", line 1653, in _request return await self._process_response( File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/openai/_base_client.py", line 1750, in _process_response return await api_response.parse() File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/openai/_response.py", line 432, in parse parsed = self._options.post_parser(parsed) File "/root/anaconda3/envs/salt/lib/python3.10/site-packages/openai/resources/embeddings.py", line 229, in parser for embedding in obj.data: AttributeError: 'str' object has no attribute 'data'

The Environment is :
llama-index 0.12.19
llama-index-agent-openai 0.4.6
llama-index-callbacks-arize-phoenix 0.3.0
llama-index-cli 0.4.0
llama-index-core 0.12.19
llama-index-embeddings-huggingface 0.5.1
llama-index-embeddings-openai 0.3.1
llama-index-indices-managed-llama-cloud 0.6.8
llama-index-legacy 0.9.48.post4
llama-index-llms-openai 0.3.20
llama-index-llms-replicate 0.4.0
llama-index-multi-modal-llms-openai 0.4.3
llama-index-program-openai 0.3.1
llama-index-question-gen-openai 0.3.0
llama-index-readers-file 0.4.5
llama-index-readers-llama-parse 0.4.0
openinference-instrumentation-llama-index 3.3.1

@yx9966 yx9966 added the question Further information is requested label Feb 27, 2025
Copy link

dosubot bot commented Feb 27, 2025

@yx9966 Hello there! I'm here to help you with your technical questions and bug fixes. Let's tackle this issue together!

The AttributeError: 'str' object has no attribute 'data' suggests that the code is attempting to access the data attribute on a string object, which does not exist. This error occurs in the parser function within the openai/resources/embeddings.py file, specifically when it tries to iterate over obj.data.

This issue might be related to the response structure from the OpenAI API. The parser function expects obj to have a data attribute, which should be a list of embeddings. However, if the API response is not as expected (e.g., due to an error or a different response format), obj might be a string (possibly an error message or a different type of response), leading to this error.

To address this issue, consider the following steps:

  1. Check API Response: Ensure that the API response is in the expected format. You might want to log the response before it is parsed to verify its structure.

  2. Error Handling: Implement error handling to check if obj is a string or does not have the data attribute before attempting to access it. This can prevent the code from breaking and allow you to handle unexpected responses gracefully.

  3. API Limits: If the error occurs when processing a large number of documents, it might be related to API rate limits or payload size limits. Check the OpenAI API documentation for any such constraints and adjust your requests accordingly.

  4. Update Dependencies: Ensure that all relevant packages, including llama-index and its dependencies, are up to date. Sometimes, bugs are fixed in newer versions.

By implementing these steps, you should be able to identify the root cause of the issue and apply a suitable fix.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@yx9966
Copy link
Author

yx9966 commented Feb 27, 2025

@yx9966 Hello there! I'm here to help you with your technical questions and bug fixes. Let's tackle this issue together!

The AttributeError: 'str' object has no attribute 'data' suggests that the code is attempting to access the data attribute on a string object, which does not exist. This error occurs in the parser function within the openai/resources/embeddings.py file, specifically when it tries to iterate over obj.data.

This issue might be related to the response structure from the OpenAI API. The parser function expects obj to have a data attribute, which should be a list of embeddings. However, if the API response is not as expected (e.g., due to an error or a different response format), obj might be a string (possibly an error message or a different type of response), leading to this error.

To address this issue, consider the following steps:

  1. Check API Response: Ensure that the API response is in the expected format. You might want to log the response before it is parsed to verify its structure.
  2. Error Handling: Implement error handling to check if obj is a string or does not have the data attribute before attempting to access it. This can prevent the code from breaking and allow you to handle unexpected responses gracefully.
  3. API Limits: If the error occurs when processing a large number of documents, it might be related to API rate limits or payload size limits. Check the OpenAI API documentation for any such constraints and adjust your requests accordingly.
  4. Update Dependencies: Ensure that all relevant packages, including llama-index and its dependencies, are up to date. Sometimes, bugs are fixed in newer versions.

By implementing these steps, you should be able to identify the root cause of the issue and apply a suitable fix.

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

I am not sure if it is the API Limit. if it does the limit form api , is there any solution to deal with this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant