This guide explains how to use the parametric Dockerfile to run scripts from the scripts/ directory with specific dependency groups.
The Dockerfile at the repository root is designed to be parametric, allowing you to:
- Install only the dependencies needed for specific functionality (via
EXTRAS) - Run any script from the
scripts/directory (viaSCRIPT) - Keep images minimal and reproducible using
uv.lock
Specifies which optional dependency groups to install. Available options:
pdf- PDF processing (pillow, pymupdf, pytesseract)rag- Vector RAG (qdrant-client, dulwich)a2a- Agent-to-Agent communication (fastapi)ftso- FTSO price oracle functionalityda- Data Availability layer functionalityfassets- FAssets protocol functionalitysocial- Social media integrations (telegram, twitter, etc.)tee- Trusted Execution Environment (cryptography, jwt)wallet- Wallet functionality (eth-account, cryptography)ingestion- General ingestion capabilities
Specifies which script to run from the scripts/ directory. Default: ingest_pdf.py
# Build the image for PDF processing
docker build -t fai-script-pdf \
--build-arg EXTRAS=pdf \
--build-arg SCRIPT=ingest_pdf.py .
# Run the script
docker run --rm -it \
-v "$PWD/scripts/data:/app/scripts/data" \
fai-script-pdf# Run with environment variables for API keys and configuration
docker run --rm -it \
-e AGENT__GEMINI_API_KEY="your_gemini_api_key" \
-e ECOSYSTEM__WEB3_PROVIDER_URL="https://flare-api.flare.network/ext/C/rpc" \
-e LOG_LEVEL="INFO" \
-v "$PWD/scripts/data:/app/scripts/data" \
fai-script-pdf# Create a .env file with your configuration
cat > .env.docker << EOF
AGENT__GEMINI_API_KEY=your_gemini_api_key
ECOSYSTEM__WEB3_PROVIDER_URL=https://flare-api.flare.network/ext/C/rpc
LOG_LEVEL=INFO
EOF
# Run with environment file
docker run --rm -it \
--env-file .env.docker \
-v "$PWD/scripts/data:/app/scripts/data" \
fai-script-pdf# Build with multiple dependency groups
docker build -t fai-script-multi \
--build-arg EXTRAS="pdf,rag,a2a" \
--build-arg SCRIPT=ingest_pdf.py .# Build for a custom script (once you create more scripts)
docker build -t fai-script-custom \
--build-arg EXTRAS=rag \
--build-arg SCRIPT=my_custom_script.py .# Mount the entire scripts directory for development
docker run --rm -it \
-v "$PWD/scripts:/app/scripts" \
-v "$PWD/src:/app/src" \
--env-file .env.docker \
fai-script-pdfThe PDF ingestion script expects data in /app/scripts/data/. Mount your local data:
# Mount local data directory
docker run --rm -it \
-v "$PWD/my-pdfs:/app/scripts/data" \
fai-script-pdf# Mount output directory for persistent results
docker run --rm -it \
-v "$PWD/scripts/data:/app/scripts/data" \
-v "$PWD/output:/app/output" \
fai-script-pdfAGENT__GEMINI_API_KEY- Google Gemini API key for AI processing
LOG_LEVEL- Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)ECOSYSTEM__WEB3_PROVIDER_URL- Web3 provider URL for blockchain interactionsINGESTION__CHUNK_SIZE- Text chunk size for processing (default: 5000)
VECTOR_DB__QDRANT_URL- Qdrant vector database URL (for RAG)SOCIAL__TELEGRAM_API_TOKEN- Telegram bot token (for social features)TEE__SIMULATE_ATTESTATION_TOKEN- Simulate TEE attestation (for testing)
The Dockerfile includes native packages required for PDF processing:
tesseract-ocr- OCR engine for text extraction from imagestesseract-ocr-eng- English language pack for Tesseractpoppler-utils- PDF utilities for document processing
To support additional languages for OCR, extend the Dockerfile:
# Add more Tesseract language packs
RUN apt-get update && apt-get install -y --no-install-recommends \
tesseract-ocr-fra \
tesseract-ocr-deu \
tesseract-ocr-spa \
&& rm -rf /var/lib/apt/lists/*Error: Script /app/scripts/my_script.py not found
Ensure your script exists in the scripts/ directory and the filename matches the SCRIPT build arg.
ModuleNotFoundError: No module named 'qdrant_client'
Make sure you included the correct EXTRAS when building the image.
PermissionError: [Errno 13] Permission denied
Check that mounted volumes have correct permissions. The container runs as user app (non-root).
TesseractNotFoundError: tesseract is not installed
This shouldn't happen with the provided Dockerfile, but if it does, ensure Tesseract is properly installed in the image.
- Use specific EXTRAS: Only install the dependencies you need
- Environment files: Use
.envfiles for configuration instead of command-line args - Volume mounts: Mount only necessary directories to keep containers lightweight
- Non-root user: The container runs as non-root user
appfor security - Caching: The Dockerfile is optimized for Docker layer caching
See the scripts/ directory for example scripts:
ingest_pdf.py- PDF ingestion and processing- More scripts will be added as the project grows
Each script should be self-contained and follow the same pattern for consistency.