This project is a personal interest piece and serves as a summary of my work over the past year. It integrates multiple technologies and techniques to create an immersive experience based on the anime series MyGo!!!!!.
The project involves extracting subtitles from the anime using OCR (Optical Character Recognition) and processing the subtitle documents through simple computer vision (CV) and data manipulation techniques. Additionally, it uses YOLO (You Only Look Once) and ResNet to identify characters in scenes and determine which character is speaking a particular line.
Finally, the system utilizes Retrieval-Augmented Generation (RAG) and Large Language Models (LLM) to complete the conversation and generate contextually accurate dialogue.
This project combines several fields such as OCR, computer vision, deep learning, and natural language processing, and it was developed as part of my personal exploration and technical growth over the past year.
- Pytorch
- PaddlePaddle
- PromptFlow
- Docker
- LangChain
- Ollama
- YOLO
- Python: 3.12
- CUDA Toolkit: 11.8
- PyTorch: 2.5.1+cu118
- Torchaudio: 2.5.1+cu118
- Torchvision: 0.20.1+cu118
To set up the environment, follow these steps:
- Install PyTorch, Torchvision, and Torchaudio:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- Install other dependencies from the
requirements.txt
file:
pip install -r requirements.txt
Note:
If you intend to process thesrc/subtitle_process_ocr
module, make sure to installcudatoolkit==11.8
.
The following configuration parameters can be adjusted according to your needs. You can edit the config.yaml
file to update the values.
Additionally, you can modify the assets\prompts\system_prompts.jinja2
file for more suitable prompts tailored to your LLM model.
- pgvector init
cd pgvector
docker-compose up --build
- llm server startup
you can setup the llm server with ollama or direct run with OpenAi api
ollama pull <model> //chose the model you want
ollama serve
- frontend startup
streamlit run my_go_app.py
Subtitle Source: 喵萌奶茶屋
Feel free to contact me to obtain the necessary asset resources. email: [email protected]