Chat with your PDF
This repository is a 'Chat-with-your-PDF' project using two different implementations, namely Light and Enterprise. Me and @Pardis-Rahbarsooreh have worked on this project.
Ensure that you have installed the libraries in requirements.txt which is located in the .\source\requirements.txt.
You can run this code from terminal:
!pip install -r requirements.txtIf you get "recursive_guard" error while running the code, try using python 3.11.
If you would like to fork the repository be sure that create an .env file in the ./source and put the API keys in it. These APIs will be needed if you would like to fully operate this code:
OPENAI_API_KEY='...'
ELASTIC_API_KEY='...'
ELASTIC_CLOUD_ID='...'
ELASTIC_END_POINT='...'
UNSTRUCTURED_API_KEY='...'
UNSTRUCTURED_SERVER_URL='...'
PINECONE_API_KEY='...'This repository has three main folders:
-
./datais the folder you should put your pdf file there. -
./sourceis the folder that consists of.pyfiles. This folder has these python files with these usages:-
To insert data to databases, use these files:
data_to_ElasticCloud.pydata_to_Pinecone.py
Simply specify your file in the line 12 and run the file.
-
To run the whole application on Streamlit you will need the
streamlit_app.py: Open Terminal an change directory to./sourceand then type:streamlit run streamlit_app.py
-
document_loader.pyhas the responsibility to Load PDFs. You can call an instance of LoadDocument class that is implemented in this file. -
chunker.pyhas the responsibility to chunk the data. This file is used only for dealing with the data that will be indexed to Pinecone database. -
pinecone_handler.pyhandles the client and connection to Pinecone servers. It also retrieves data. -
elasticsearchhandler.pyhandles the client and connection to Elastic Cloud. -
unstructured_io_handler.pyhandles the connection and getting results from the 'Unstructured.io' servers. -
light_model.pyhas the chain related to Light Model. -
enterprise_model.pyhas the chain related to Enterprise Model. -
test_synthetic_data.pyis for testing the app via benchmarks. If you want to run this file, remember to change context window of light model and useenterprise_model_for_test.pyinstead ofenterprise_model.py.
-