API Starter Pack - Zendesk Documentation Processor

A comprehensive tool for processing and analyzing Zendesk help center documentation using OpenAI's API. The main purpose of this project is to demonstrate how to work with APIs, preprocess data, and deploy AI-powered documentation analysis with Docker containerization and reverse proxy setup.

Demo

🌐 Live Demo: https://logs.harrylee.id.vn/

Droplet has been destroyed, but there are more result pictures in static/, check that out!

Logs view: Document view: Assistant demo:

🚀 Overview

This project provides an automated solution for:

Scraping Zendesk help center articles and categories
Processing and organizing documentation into structured formats
Analyzing token usage and content distribution
Preparing data for AI-powered assistants and vector stores
Docker containerization with reverse proxy for production deployment

✨ Features

Automated Documentation Scraping: Fetches articles and categories from Zendesk help center
Content Organization: Organizes articles by category in a structured file system
Token Analysis: Analyzes and visualizes token distribution across documentation
OpenAI Integration: Uploads processed files to vector store for AI assistant training
Progress Tracking: Detailed logging and progress monitoring with web interface
Error Handling: Robust error handling for API failures and file operations
Docker Support: Complete containerization with multi-service architecture
Reverse Proxy: Caddy-based reverse proxy for domain routing and SSL termination

🛠️ Setup

Prerequisites

For Local Development:
- Python 3.7+
- Zendesk API access
- OpenAI API key
- Required Python packages (see installation below)
For Docker Deployment:
- Docker and Docker Compose
- Domain name (optional, for production)
- Zendesk API access
- OpenAI API key

Installation

Option 1: Docker Deployment (Recommended)

Clone the repository

git clone <repository-url>
cd API-starter-pack

Environment Configuration

Create a .env file in the root directory based on env.sample:

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_VECTOR_STORE_ID=your_vector_store_id
SUPPORT_URL=https://your-zendesk-instance.zendesk.com/api/v2/help_center

Configure Domain (Optional)

Edit the Caddyfile to use your domain:

your-domain.com {
    reverse_proxy log-server:8080 {
      flush_interval -1
    }
}

Option 2: Local Python Development

Clone the repository

git clone <repository-url>
cd API-starter-pack

Install dependencies
```
pip install -r requirements.txt
```
Required dependencies:
- OpenAI
- requests
- Markdown
- python-dotenv

Environment Configuration

Create a .env file in the root directory:

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_VECTOR_STORE_ID=your_vector_store_id
SUPPORT_URL=https://your-zendesk-instance.zendesk.com/api/v2/help_center

🏃 How to Run

Docker Deployment

# Start all services
docker-compose up -d --build

# Stop services
docker-compose down

This will start:

Scraper Service: Processes Zendesk documentation
Log Server: Web interface for monitoring (port 8080)
Caddy: Reverse proxy with automatic SSL (ports 80/443)

Access via:

Local: http://localhost
With Domain: https://your-domain.com

Local Python Development

# Run scraper
python main.py

# Run log server on localhost:8080
python log-server/app.py

This setup will only run on localhost without domain

Analyze Token Usage

For better chunking strategy while attaching files to vector store, I make a script to count all the files tokens and make decision.

python src/count_tokens.py

This will:

Count tokens in all markdown files
Generate statistical analysis
Create visualizations of token distribution
Display file count by token ranges

🐳 Docker Architecture

The project uses a multi-service Docker architecture:

┌─────────────────┐     ┌─────────────────┐       ┌─────────────────┐
│   Caddy Proxy   │     │   Log Server    │       │     Scraper     │
│  (Port 80/443)  │────>│   (Port 8080)   │◄──────│   (Background)  │
└─────────────────┘     └─────────────────┘       └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
   ┌────────────────────────────────────────────────────────────────┐
   │                       Shared Volumes                           │
   │     ┌──────────────┐  ┌─────────────┐  ┌──────────────┐        │
   │     │ scraper-logs │  │ docs-volume │  │  caddy_data  │        │
   │     └──────────────┘  └─────────────┘  └──────────────┘        │
   └────────────────────────────────────────────────────────────────┘

Services

scraper: Main application that processes Zendesk documentation
log-server: Flask web server providing monitoring interface
caddy: Reverse proxy with automatic SSL certificate management

Volumes

scraper-logs: Shared logs between scraper and log server
docs-volume: Processed documentation files
caddy_data: SSL certificates and Caddy data
caddy_config: Caddy configuration

📊 Project Structure

API-starter-pack/
├── docker-compose.yml      # Docker services configuration
├── Dockerfile              # Main application container
├── Caddyfile              # Reverse proxy configuration
├── main.py                # Main scraping script
├── src/
│   ├── count_tokens.py    # Token analysis tool
│   ├── delete_files.py    # File cleanup utility
│   └── upload.py          # OpenAI upload utility
├── log-server/
│   ├── app.py             # Flask web server
│   ├── Dockerfile         # Log server container
│   ├── requirements.txt   # Log server dependencies
│   └── templates/
│       └── index.html     # Web interface
├── docs/                  # Processed documentation
│   ├── category-1/
│   │   ├── articles.json
│   │   └── article-files.md
│   └── category-2/
├── static/                # Static assets and screenshots
├── .env                   # Environment variables
└── README.md             # This file

🔧 Configuration

Environment Variables

Variable	Description	Required
`OPENAI_API_KEY`	Your OpenAI API key	Yes
`OPENAI_VECTOR_STORE_ID`	OpenAI vector store ID	Yes
`SUPPORT_URL`	Zendesk help center API URL	Yes

Caddy Configuration

The Caddyfile configures the reverse proxy:

# Eg. logs.harrylee.id.vn
<your-domain> {
    reverse_proxy log-server:8080 {
      flush_interval -1
    }
}

Features:

Automatic SSL certificate generation
Domain-based routing

API Endpoints Used

GET /api/v2/help_center/categories/ - Fetch all categories
GET /api/v2/help_center/categories/{id}/articles - Fetch articles by category

📈 Timeline

It took me 1 week in total to code, debug, deploy and write reports. The phases below are not necessarily in order, just an estimate of the total time since I spent most of my time writing documents.

Phase 1: Initial Setup & Research

Duration: ~1 hours

Repository creation and initial setup
Zendesk API, OpenAI, and Digital Ocean research
Basic project structure implementation

Phase 2: Scraper Development

Duration: ~2 hours

Scrape files from Zendesk API
Convert and process data to make clean Markdown file
Storing files for display
Error handling (add, update, skip), optimization and docker containerization

Phase 3: OpenAI Assistant Development & Files Processing

Duration: ~4 hours

Token analysis implementation to decise chunking strategy
Uploading files and attach to vector store via OpenAI API
Error handling, optimization and docker containerization

Phase 4: Scraper Cronjob

Duration: ~1 hour

Setup Linux cronjob
Caddy reverse proxy integration
Production deployment configuration

Phase 5: Documentation

Duration: ~5 hours

Writing reports and README.md

🤝 Contributing

This is meant to be a 1-time task but if the submitted features look feasible to me, I will consider.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Bottom Text: This project is designed for educational and practical purposes. Please ensure you have proper authorization to access and process the Zendesk data you're working with.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
log-server		log-server
src		src
static		static
.dockerignore		.dockerignore
.gitignore		.gitignore
Caddyfile		Caddyfile
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
env.sample		env.sample
main.py		main.py
requirements.txt		requirements.txt
run-scraper.sh		run-scraper.sh

Folders and files

Latest commit

History

Repository files navigation

API Starter Pack - Zendesk Documentation Processor

Demo

🚀 Overview

✨ Features

🛠️ Setup

Prerequisites

Installation

Option 1: Docker Deployment (Recommended)

Option 2: Local Python Development

🏃 How to Run

Docker Deployment

Local Python Development

Analyze Token Usage

🐳 Docker Architecture

Services

Volumes

📊 Project Structure

🔧 Configuration

Environment Variables

Caddy Configuration

API Endpoints Used

📈 Timeline

Phase 1: Initial Setup & Research

Phase 2: Scraper Development

Phase 3: OpenAI Assistant Development & Files Processing

Phase 4: Scraper Cronjob

Phase 5: Documentation

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages