This project implements a data pipeline for real-time sentiment prediction using various pretrained BERT and GPT models. The models can be finetuned using the finetune.py on custom dataset. All the available models can be simultaneously trained and tested with model metrics being populated using finetune_all.py The pipeline is implemented using Flask, Kafka, and the Transformers library.
1.For finetuning one model just provide the path to dataset and model name
available model types - model_types = ["roberta", "textattack", "bert", "gpt2", "distilbert"]
python fine_tune_from_models.py --model_type roberta --dataset_path path/to/your/dataset.txt
2.For finetuning all the models and observing their eval metrics to choose the best one use finetune_all.py
User Message --> Flask API --> Kafka Publisher --> Kafka Topic --> Kafka Consumer --> BERT Model --> API Response
- Python 3.x
- Kafka
- Zookeeper
- Flask
- Transformers library
-
Clone the repository git clone https://github.com/your_repository_url.git
-
Install Dependencies pip install -r requirements.txt
-
Start Zookeeper bin/zookeeper-server-start.sh config/zookeeper.properties
-
Start Kafka bin/kafka-server-start.sh config/server.properties
-
Start Flask App python3 main.py
Endpoint: /send_message
- Method: POST
- Data Params: {"user_input": "your_text_here"}
Sample cURL Command curl --header "Content-Type: application/json" --request POST --data '{"user_input":"this is the best food"}' http://0.0.0.0:5000/send_message
You can perform stress testing of the kafka queues using the provided Python script. python3 api_request.py