The full tutorial is available on the Upsun Devcenter.
A multi-language demonstration of document ingestion and web-based listing using ChromaDB, OpenAI embeddings, and modern web frameworks. This project showcases identical functionality implemented in both Python and Node.js/TypeScript.
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Markdown │ │ ChromaDB │ │ Web Apps │
│ Documents │────│ Collections │────│ Python/Node.js │
│ (.md files) │ │ (Embeddings) │ │ (Listing UI) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌────────▼────────┐
│ OpenAI API │
│ (Embeddings) │
└─────────────────┘
- Document Ingestion: Processes markdown files into semantic chunks
- Vector Embeddings: Uses OpenAI's
text-embedding-3-small
model - ChromaDB Storage: Efficient vector database for similarity search
- Web Interface: Clean UI showing ingested files and chunk counts
- Multi-language: Identical functionality in Python and TypeScript
- Cloud Ready: Configured for Upsun platform deployment
- Environment Flexible: Supports both local and remote ChromaDB instances
├── python-app/ # Python Flask implementation
│ ├── main.py # Web server (Flask)
│ ├── ingest.py # Document processing script
│ ├── pyproject.toml # Python dependencies
│ ├── data/ # Markdown files
│ └── .env.example # Environment template
├── nodejs-app/ # Node.js TypeScript implementation
│ ├── src/
│ │ ├── index.ts # Web server (Express)
│ │ └── ingest.ts # Document processing script
│ ├── package.json # Node.js dependencies
│ ├── tsconfig.json # TypeScript configuration
│ ├── data/ # Markdown files
│ └── .env.example # Environment template
├── .upsun/
│ └── config.yaml # Upsun platform configuration
└── README.md # This file
- Flask - Web framework
- ChromaDB - Vector database client
- OpenAI - Embedding generation
- UV - Fast Python package manager
- Express - Web framework
- ChromaDB - Vector database client
- OpenAI - Embedding generation
- TypeScript - Type safety
- ChromaDB - Vector database server
- Upsun - Cloud platform
- OpenAI API - Embedding service
Both applications support flexible ChromaDB connections via environment variables:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# ChromaDB Configuration (leave empty for local instance)
CHROMA_HOST= # e.g., chroma.example.com
CHROMA_PORT=8000 # Default: 8000
CHROMA_SSL=false # true for HTTPS
CHROMA_AUTH_TOKEN= # Optional authentication
cd python-app
# Install dependencies
uv sync
# Set environment variables
cp .env.example .env
# Edit .env with your OpenAI API key
# Run ingestion (processes markdown files)
uv run python ingest.py
# Start web server
uv run python main.py
# Visit: http://localhost:5000
cd nodejs-app
# Install dependencies
npm ci
# Set environment variables
cp .env.example .env
# Edit .env with your OpenAI API key
# Build TypeScript
npm run build
# Run ingestion (processes markdown files)
npm run ingest
# Start web server
npm run start
# Visit: http://localhost:3000
This project is configured for deployment on the Upsun platform with three services:
- ChromaDB Service: Vector database server
- Python App: Flask web application
- Node.js App: Express web application
- Python App:
https://python.{your-domain}/
- Node.js App:
https://nodejs.{your-domain}/
Both applications automatically run their ingestion scripts during deployment, ensuring fresh data on every deploy.
- Reads all
.md
files from thedata/
directory - Splits documents into overlapping chunks (1000 words, 200 word overlap)
- Generates unique IDs using content hashing
- Python: Manual OpenAI API calls with batching
- Node.js: ChromaDB's built-in OpenAI embedding function
- Stores chunks with metadata (filename, filepath, chunk index)
- Collections named
python-app
andnodejs-app
respectively - Automatically clears existing data on re-ingestion
- Lists all processed files with chunk counts
- Shows collection statistics (total files, total chunks)
- Handles errors gracefully with helpful messages
# Type checking and linting (if available)
uv run python -m mypy .
# Run development server with auto-reload
uv run python main.py
# Type checking
npm run type-check
# Development mode with auto-reload
npm run dev
# Development ingestion (uses tsx)
npm run ingest:dev
The data/
directories contain sample markdown files covering various technical topics:
- Advanced prompting techniques
- Platform.sh build pipelines
- Python development with UV
- PyTorch deployment
- Configuration as code
- Environment management
- Fork the repository
- Create a feature branch
- Make your changes
- Test both Python and Node.js implementations
- Submit a pull request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- ChromaDB Documentation
- OpenAI Embeddings API
- Upsun Platform
- Flask Documentation
- Express.js Documentation
"Collection not found" error:
- Run the ingestion script first:
python ingest.py
ornpm run ingest
OpenAI API errors:
- Verify your
OPENAI_API_KEY
is set correctly - Check your OpenAI account has sufficient credits
ChromaDB connection issues:
- For local development, ensure no
CHROMA_HOST
is set - For remote instances, verify
CHROMA_HOST
,CHROMA_PORT
, andCHROMA_SSL
settings
TypeScript compilation errors:
- Run
npm run build
to compile TypeScript before usingnpm run start
- Use
npm run dev
for development with auto-compilation