Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: RateLimit Embedding #17941

Open
1 task done
mirallm opened this issue Feb 27, 2025 · 1 comment
Open
1 task done

[Question]: RateLimit Embedding #17941

mirallm opened this issue Feb 27, 2025 · 1 comment
Labels
question Further information is requested

Comments

@mirallm
Copy link

mirallm commented Feb 27, 2025

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hello I am getting this warning that I would like to manage it. @dosubot

WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.OpenAIEmbedding._aget_text_embeddings.._retryable_aget_embeddings in 0.4735038183779753 seconds as it raised RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}.

Based on my code, how can I manage it:

         :param nodes: List of TextNodes to index.
                            :param embed_model: The model used for embedding the documents.
                            """
                            from llama_index.core.data_structs.data_structs import IndexDict
                            try:
                          
                                if self.vector_index is None:
                                    
                                    
                                        # Create a new index if not already created
                                        self.vector_index = VectorStoreIndex(
                                            nodes=nodes,
                                            use_async=False,
                                        
                                            embed_model=embed_model,
                                            storage_context=self.storage_context, 
                                            show_progress=True
                                        )
                                        print(self.vector_index.index_struct)
                                        print("Created new index.")
                                else:
                                        # If index exists, load it and insert the nodes
                                        print("Loading existing index and inserting nodes.")
                                        await self.load_index(embed_model)
                                    # index_struct = IndexDict()
                                        index_struct=self.vector_index._build_index_from_nodes(nodes)
                                      #  self.vector_index.insert_nodes(nodes)
                                       # print(index_struct)
                                        await self.vector_index._async_add_nodes_to_index(index_struct=index_struct, nodes=nodes, show_progress=True, min_seconds=12, max_seconds=120) #
                            except Exception as e:
                               print(e)
                               print(f"An error occurred while embedding the file: {e}")`
        
        @dosubot
@mirallm mirallm added the question Further information is requested label Feb 27, 2025
Copy link

dosubot bot commented Feb 27, 2025

To manage the RateLimitError when using the OpenAIEmbedding with the Azure OpenAI API, you can implement a retry mechanism using the tenacity library. This approach allows you to handle rate limit errors by retrying the request with an exponential backoff strategy. Here's a code snippet that demonstrates how to set up this retry mechanism:

from tenacity import retry, wait_random_exponential, stop_after_attempt

# Define your retry strategy
retry_decorator = retry(
    wait=wait_random_exponential(min=4, max=10),  # Random exponential backoff between 4 and 10 seconds
    stop=stop_after_attempt(5)  # Maximum number of retries
)

# Apply the retry decorator to your function that makes the API call
@retry_decorator
async def call_openai_embedding():
    # Your code to call the OpenAI API
    pass

# Use the function
await call_openai_embedding()

In your existing code, you can apply this retry strategy to the part where you are calling the OpenAI API for embeddings. This will help manage the rate limit by automatically retrying the request if a RateLimitError occurs, with a delay that increases exponentially between retries [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant