PdfQuery

A powerful application that combines document processing, semantic search, and question answering capabilities. This application uses FAISS for efficient similarity search and Hugging Face models for text embeddings and generation.

Features

PDF document processing and text extraction
Semantic search using FAISS indexing
RESTful API endpoints for querying
Web interface for easy interaction
Docker support for containerized deployment
Efficient document chunking and embedding

Tech Stack

Backend

Python 3.x
Flask (Web Framework)
FAISS (Vector Similarity Search)
LangChain (Document Processing)
Hugging Face Transformers
Sentence Transformers

Frontend

Node.js
Express.js
Material Design Components
Axios for API calls

Prerequisites

Python 3.x
Node.js and npm
Docker (optional)

Usage

Start the Python backend server:

python api.py

Start the Node.js server:

node server.js

Access the web interface at http://localhost:3000

API Endpoints

Query Endpoint

URL: /query
Method: POST
Body:

{
    "query": "Your question here"
}

Project Structure

rag-application/
├── api.py                 # Flask API server
├── create_database.py     # Database creation and indexing
├── query_data.py         # Query processing logic
├── pdf_util.py           # PDF processing utilities
├── server.js             # Node.js server
├── public/               # Frontend static files
├── faiss_index/          # FAISS index storage
├── Data/                 # Document storage
└── requirements.txt      # Python dependencies

Docker Support

Build and run using Docker:

docker build -t rag-application .
docker run -p 8000:8000 rag-application

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the ISC License.

Acknowledgments

Hugging Face for their transformer models
FAISS for efficient similarity search
LangChain for document processing capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data		Data
faiss_index		faiss_index
node_modules		node_modules
public		public
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
README.md		README.md
api.py		api.py
compare_embedded.py		compare_embedded.py
create_database.py		create_database.py
metadata.db		metadata.db
package-lock.json		package-lock.json
package.json		package.json
pdf-deployment.yaml		pdf-deployment.yaml
pdf-service.yaml		pdf-service.yaml
query_data.py		query_data.py
requirements.txt		requirements.txt
server.js		server.js
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PdfQuery

Features

Tech Stack

Backend

Frontend

Prerequisites

Usage

API Endpoints

Query Endpoint

Project Structure

Docker Support

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

No-one9/PdfQuery

Folders and files

Latest commit

History

Repository files navigation

PdfQuery

Features

Tech Stack

Backend

Frontend

Prerequisites

Usage

API Endpoints

Query Endpoint

Project Structure

Docker Support

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages