About

This project is a personal interest piece and serves as a summary of my work over the past year. It integrates multiple technologies and techniques to create an immersive experience based on the anime series MyGo!!!!!.

The project involves extracting subtitles from the anime using OCR (Optical Character Recognition) and processing the subtitle documents through simple computer vision (CV) and data manipulation techniques. Additionally, it uses YOLO (You Only Look Once) and ResNet to identify characters in scenes and determine which character is speaking a particular line.

Finally, the system utilizes Retrieval-Augmented Generation (RAG) and Large Language Models (LLM) to complete the conversation and generate contextually accurate dialogue.

This project combines several fields such as OCR, computer vision, deep learning, and natural language processing, and it was developed as part of my personal exploration and technical growth over the past year.

result

YOLO result

RESNET50 result

LLM result

ps: 你只是個學生...

ps: 挺好的...

ps: 是音樂風格不同嗎...

Project Framework

Pytorch
PaddlePaddle
PromptFlow
Docker
LangChain
Ollama
YOLO

Requirements

Python: 3.12
CUDA Toolkit: 11.8
PyTorch: 2.5.1+cu118
Torchaudio: 2.5.1+cu118
Torchvision: 0.20.1+cu118

Environment Setup

To set up the environment, follow these steps:

Install PyTorch, Torchvision, and Torchaudio:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Install other dependencies from the requirements.txt file:

pip install -r requirements.txt

Note:
If you intend to process the src/subtitle_process_ocr module, make sure to install cudatoolkit==11.8.

Configuration

The following configuration parameters can be adjusted according to your needs. You can edit the config.yaml file to update the values. Additionally, you can modify the assets\prompts\system_prompts.jinja2 file for more suitable prompts tailored to your LLM model.

startup

pgvector init

cd pgvector
docker-compose up --build

llm server startup

you can setup the llm server with ollama or direct run with OpenAi api

ollama pull <model> //chose the model you want
ollama serve

frontend startup

streamlit run my_go_app.py

other assets

Subtitle Source: 喵萌奶茶屋

Feel free to contact me to obtain the necessary asset resources. email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
config		config
pgvector		pgvector
src		src
.gitignore		.gitignore
README.md		README.md
content_analysis.ipynb		content_analysis.ipynb
get_frame.py		get_frame.py
get_speaker.py		get_speaker.py
get_sub.py		get_sub.py
lmm_chat.ipynb		lmm_chat.ipynb
my_go_app.py		my_go_app.py
ocr_file.py		ocr_file.py
requirements.txt		requirements.txt
train_yolo.ipynb		train_yolo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

result

YOLO result

RESNET50 result

LLM result

Project Framework

Requirements

Environment Setup

Configuration

startup

other assets

About

Releases

Packages

Contributors 2

Languages

QuikZiHao/mygo_llm

Folders and files

Latest commit

History

Repository files navigation

About

result

YOLO result

RESNET50 result

LLM result

Project Framework

Requirements

Environment Setup

Configuration

startup

other assets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages