A comprehensive tool for processing and analyzing Zendesk help center documentation using OpenAI's API. The main purpose of this project is to demonstrate how to work with APIs, preprocess data, and deploy AI-powered documentation analysis with Docker containerization and reverse proxy setup.
π Live Demo: https://logs.harrylee.id.vn/
Droplet has been destroyed, but there are more result pictures in static/, check that out!
Logs view:
Document view:
Assistant demo:

This project provides an automated solution for:
- Scraping Zendesk help center articles and categories
- Processing and organizing documentation into structured formats
- Analyzing token usage and content distribution
- Preparing data for AI-powered assistants and vector stores
- Docker containerization with reverse proxy for production deployment
- Automated Documentation Scraping: Fetches articles and categories from Zendesk help center
- Content Organization: Organizes articles by category in a structured file system
- Token Analysis: Analyzes and visualizes token distribution across documentation
- OpenAI Integration: Uploads processed files to vector store for AI assistant training
- Progress Tracking: Detailed logging and progress monitoring with web interface
- Error Handling: Robust error handling for API failures and file operations
- Docker Support: Complete containerization with multi-service architecture
- Reverse Proxy: Caddy-based reverse proxy for domain routing and SSL termination
-
For Local Development:
- Python 3.7+
- Zendesk API access
- OpenAI API key
- Required Python packages (see installation below)
-
For Docker Deployment:
- Docker and Docker Compose
- Domain name (optional, for production)
- Zendesk API access
- OpenAI API key
-
Clone the repository
git clone <repository-url> cd API-starter-pack
-
Environment Configuration
Create a
.envfile in the root directory based onenv.sample:OPENAI_API_KEY=your_openai_api_key_here OPENAI_VECTOR_STORE_ID=your_vector_store_id SUPPORT_URL=https://your-zendesk-instance.zendesk.com/api/v2/help_center
-
Configure Domain (Optional)
Edit the
Caddyfileto use your domain:your-domain.com { reverse_proxy log-server:8080 { flush_interval -1 } }
-
Clone the repository
git clone <repository-url> cd API-starter-pack
-
Install dependencies
pip install -r requirements.txt
Required dependencies:
- OpenAI
- requests
- Markdown
- python-dotenv
-
Environment Configuration
Create a
.envfile in the root directory:OPENAI_API_KEY=your_openai_api_key_here OPENAI_VECTOR_STORE_ID=your_vector_store_id SUPPORT_URL=https://your-zendesk-instance.zendesk.com/api/v2/help_center
# Start all services
docker-compose up -d --build
# Stop services
docker-compose downThis will start:
- Scraper Service: Processes Zendesk documentation
- Log Server: Web interface for monitoring (port 8080)
- Caddy: Reverse proxy with automatic SSL (ports 80/443)
Access via:
- Local: http://localhost
- With Domain: https://your-domain.com
# Run scraper
python main.py
# Run log server on localhost:8080
python log-server/app.pyThis setup will only run on localhost without domain
For better chunking strategy while attaching files to vector store, I make a script to count all the files tokens and make decision.
python src/count_tokens.pyThis will:
- Count tokens in all markdown files
- Generate statistical analysis
- Create visualizations of token distribution
- Display file count by token ranges
The project uses a multi-service Docker architecture:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Caddy Proxy β β Log Server β β Scraper β
β (Port 80/443) βββββ>β (Port 8080) βββββββββ (Background) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Shared Volumes β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β scraper-logs β β docs-volume β β caddy_data β β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- scraper: Main application that processes Zendesk documentation
- log-server: Flask web server providing monitoring interface
- caddy: Reverse proxy with automatic SSL certificate management
- scraper-logs: Shared logs between scraper and log server
- docs-volume: Processed documentation files
- caddy_data: SSL certificates and Caddy data
- caddy_config: Caddy configuration
API-starter-pack/
βββ docker-compose.yml # Docker services configuration
βββ Dockerfile # Main application container
βββ Caddyfile # Reverse proxy configuration
βββ main.py # Main scraping script
βββ src/
β βββ count_tokens.py # Token analysis tool
β βββ delete_files.py # File cleanup utility
β βββ upload.py # OpenAI upload utility
βββ log-server/
β βββ app.py # Flask web server
β βββ Dockerfile # Log server container
β βββ requirements.txt # Log server dependencies
β βββ templates/
β βββ index.html # Web interface
βββ docs/ # Processed documentation
β βββ category-1/
β β βββ articles.json
β β βββ article-files.md
β βββ category-2/
βββ static/ # Static assets and screenshots
βββ .env # Environment variables
βββ README.md # This file
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | Yes |
OPENAI_VECTOR_STORE_ID |
OpenAI vector store ID | Yes |
SUPPORT_URL |
Zendesk help center API URL | Yes |
The Caddyfile configures the reverse proxy:
# Eg. logs.harrylee.id.vn
<your-domain> {
reverse_proxy log-server:8080 {
flush_interval -1
}
}Features:
- Automatic SSL certificate generation
- Domain-based routing
GET /api/v2/help_center/categories/- Fetch all categoriesGET /api/v2/help_center/categories/{id}/articles- Fetch articles by category
It took me 1 week in total to code, debug, deploy and write reports. The phases below are not necessarily in order, just an estimate of the total time since I spent most of my time writing documents.
Duration: ~1 hours
- Repository creation and initial setup
- Zendesk API, OpenAI, and Digital Ocean research
- Basic project structure implementation
Duration: ~2 hours
- Scrape files from Zendesk API
- Convert and process data to make clean Markdown file
- Storing files for display
- Error handling (add, update, skip), optimization and docker containerization
Duration: ~4 hours
- Token analysis implementation to decise chunking strategy
- Uploading files and attach to vector store via OpenAI API
- Error handling, optimization and docker containerization
Duration: ~1 hour
- Setup Linux cronjob
- Caddy reverse proxy integration
- Production deployment configuration
Duration: ~5 hours
- Writing reports and README.md
This is meant to be a 1-time task but if the submitted features look feasible to me, I will consider.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Bottom Text: This project is designed for educational and practical purposes. Please ensure you have proper authorization to access and process the Zendesk data you're working with.