Replies: 1 comment
-
To integrate Amazon Textract with S3 and EC2 for asynchronous data extraction and ensure your RAG application updates accordingly, you can follow these steps:
This setup ensures that your RAG application on the EC2 instance is always up-to-date with the latest documents from the S3 bucket, and the vector store is refreshed accordingly. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I am working on a Retrieval-Augmented Generation (RAG) application for tabular data extraction from PDF documents. Here is my current setup:
Tabular Data Extraction:
Loading and Orchestration:
I now want to enhance my setup by using Amazon Textract's asynchronous capabilities to extract tables and store the extracted CSV files in an S3 bucket. My goal is to run my RAG application on an EC2 instance and sync the local directory with the S3 bucket. This way, whenever new files are added to the S3 bucket, they are also updated in my EC2 local directory. This update should trigger my application to update the vector store, ensuring that I get the latest responses based on new documents.
Specific Requirements:
I am able to refresh the vector store based on the arrival of new documents. However, I need guidance on whether there is any specific implementation available that can synchronize the S3 bucket with the local EC2 directory effectively.
Any suggestions or guidance would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions