Skip to content

joshsgoldstein/multi2vec-transformers-colpali

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

multi2vec-colpali-inference

The inference container for ColPali document retrieval models

Overview

This service provides document retrieval capabilities using ColPali models. ColPali is a vision-language model optimized for document understanding, supporting complex layouts, tables, charts, and multimodal content without requiring OCR preprocessing.

Supported Models

The service automatically supports all ColPali-compatible models:

  • ColQwen2 (recommended): vidore/colqwen2-v1.0
  • ColPali: vidore/colpali-v1.0, vidore/colpali-v1.1
  • ColSmol (smaller): vidore/colsmol-v1.0
  • Custom models following ColPali architecture

Environment Variables

Model Configuration

  • COLPALI_MODEL_NAME: ColPali model to download (default: vidore/colqwen2-v1.0)
  • TRUST_REMOTE_CODE: Enable trust remote code for custom models (default: false)

Hardware Configuration

  • ENABLE_CUDA: Enable CUDA GPU support (true or 1)
  • CUDA_CORE: Specific CUDA device (default: cuda:0)

Build Docker Container

LOCAL_REPO="multi2vec-colpali" \
  COLPALI_MODEL_NAME="vidore/colqwen2-v1.0" \
  ./cicd/build.sh

Model Examples

# ColQwen2 (recommended)
COLPALI_MODEL_NAME="vidore/colqwen2-v1.0" ./cicd/build.sh

# ColPali v1.1
COLPALI_MODEL_NAME="vidore/colpali-v1.1" ./cicd/build.sh

# ColSmol (smaller model)
COLPALI_MODEL_NAME="vidore/colsmol-v1.0" ./cicd/build.sh

API Endpoints

/vectorize - Document Retrieval (Primary)

Returns multi-vector embeddings optimized for document retrieval:

curl -X POST "http://localhost:8000/vectorize" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["What is shown in this document?"],
    "images": ["<base64_encoded_image>"]
  }'

Response includes:

  • textVectors: Multi-vector query embeddings
  • imageVectors: Multi-vector document embeddings
  • similarityScores: MaxSim similarity scores (when both texts and images provided)

/vectorize-clip - Legacy Compatibility

Returns single vectors (mean pooled) for backward compatibility with CLIP-based systems.

/meta - Model Information

Returns model configuration and capabilities.

Key Features

  • Multi-vector embeddings: More precise document representation than single vectors
  • Layout awareness: Understands tables, charts, and spatial relationships
  • No OCR required: End-to-end visual document processing
  • MaxSim scoring: Advanced similarity computation for document retrieval
  • GPU acceleration: CUDA support for faster inference
  • Multiple model variants: Automatic support for all ColPali models

Run Tests

LOCAL_REPO="multi2vec-colpali" ./cicd/test.sh

Documentation

For more information about ColPali models and document retrieval:

About

Inference Server to Run the colpali based models. This has support also for the NVidia Jetson Developer Kit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6