A peer-to-peer network for deploying and accessing Hugging Face language models. ConnectIT allows you to deploy any Hugging Face model as a service on a decentralized network and request text generation from the cheapest/lowest-latency providers.
- 🌐 Decentralized P2P Network: No central server required
- 🤖 Hugging Face Integration: Deploy any HF model instantly
- 💰 Cost Optimization: Automatically selects cheapest providers
- ⚡ Low Latency: Smart provider selection based on response time
- 🔒 Secure: Custom licensing with commercial protection
- 🚀 Easy Setup: One-command deployment and requests
Requirements:
- Python 3.9+
- 2GB+ RAM recommended
- Network connectivity for P2P operations
# Basic installation
pip install connectit
# With Hugging Face support
pip install connectit[hf]
# With all optional dependencies
pip install connectit[all]git clone <repository-url>
cd connectit
pip install -e .Prerequisites: Python 3.9+, pip
-
Install ConnectIT:
pip install -e .For full functionality with Hugging Face models:
pip install -e .[all]
-
Deploy a Hugging Face model:
python -m connectit deploy-hf --model distilgpt2 --price-per-token 0.002 --host 127.0.0.1 --port 4334
-
Request text generation from another terminal:
python -m connectit p2p-request "Hello world" --bootstrap-link "p2pnet://join?network=connectit&model=distilgpt2&hash=32a0fa785bfb95c97ced872ac200560ffface58c574c775b7fd8304494a4d4e3&bootstrap=d3M6Ly8xMjcuMC4wLjE6NDMzNA=="
Note: Use the join link displayed by the provider, not the raw WebSocket address.
Deploy a Hugging Face text-generation model as a service on the P2P network.
python -m connectit deploy-hf --model MODEL_NAME --price-per-token PRICE --host HOST --port PORTParameters:
--model: Hugging Face model name (e.g.,distilgpt2,gpt2,microsoft/DialoGPT-medium)--price-per-token: Price per output token (float, e.g.,0.002)--host: Bind host address (default:0.0.0.0)--port: Bind port (default:4001)
Example:
python -m connectit deploy-hf --model distilgpt2 --price-per-token 0.002 --host 127.0.0.1 --port 4334The provider will display a join link like:
🔗 Join link: p2pnet://join?network=connectit&model=distilgpt2&hash=...&bootstrap=...
Request text generation from providers on the P2P network.
python -m connectit p2p-request PROMPT [OPTIONS]Parameters:
PROMPT: Text prompt for generation (required)--model: Model name to request (default:distilgpt2)--bootstrap-link: P2P network join link from a provider (required)--max-new-tokens: Maximum tokens to generate (default:32)
Example:
python -m connectit p2p-request "Hello world" --bootstrap-link "p2pnet://join?network=connectit&model=distilgpt2&hash=32a0fa785bfb95c97ced872ac200560ffface58c574c775b7fd8304494a4d4e3&bootstrap=d3M6Ly8xMjcuMC4wLjE6NDMzNA=="Important: Always use the complete p2pnet:// join link provided by the provider, not raw WebSocket addresses.
Possible causes:
- Model name mismatch between request and provider
- Bootstrap link is incorrect or expired
- Provider is not running or unreachable
- Network connectivity issues
Solutions:
- Verify the model name matches exactly (case-sensitive)
- Copy the complete join link from the provider output
- Ensure the provider is running and shows "ready to accept connections"
- Check firewall settings if connecting across networks
Possible causes:
- Terminal encoding issues
- Long-running process conflicts
Solutions:
- Run commands in separate terminals
- Ensure proper terminal encoding (UTF-8)
- Restart terminals if needed
Symptoms:
- Peer connection failures
- Bootstrap connection timeouts
- Generation request failures
Solutions:
- Verify both provider and client are on the same network
- Check port availability and firewall rules
- Try different host/port combinations
- Ensure provider is fully loaded before making requests
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under a custom license that permits non-commercial use only. For commercial use, please contact: loaiabdalslam@gmail.com
See the LICENSE file for full details.
Request text generation from the P2P network. Automatically selects the cheapest/lowest-latency provider for the specified model.
python -m connectit p2p-request "PROMPT_TEXT" --model MODEL_NAME --bootstrap-link BOOTSTRAP_LINKParameters:
PROMPT_TEXT: The text prompt for generation (required)--model: Model name to request (default:distilgpt2)--bootstrap-link: Bootstrap link to join the network (required for discovery)--max-new-tokens: Maximum new tokens to generate (default:32)
Examples:
# Basic text generation
python -m connectit p2p-request "Hello world" --model distilgpt2 --bootstrap-link ws://127.0.0.1:4334
# Longer generation with more tokens
python -m connectit p2p-request "The future of AI is" --model distilgpt2 --max-new-tokens 50 --bootstrap-link ws://127.0.0.1:4334
# Question answering
python -m connectit p2p-request "What is artificial intelligence?" --model distilgpt2 --max-new-tokens 100 --bootstrap-link ws://127.0.0.1:4334
# Creative writing prompt
python -m connectit p2p-request "Once upon a time in a distant galaxy" --model distilgpt2 --max-new-tokens 75 --bootstrap-link ws://127.0.0.1:4334Step 1: Start a local provider in one terminal:
python -m connectit deploy-hf --model distilgpt2 --price-per-token 0.002 --host 127.0.0.1 --port 4334Step 2: Test requests from another terminal:
# Simple test
python -m connectit p2p-request "Hello, world!" --model distilgpt2 --bootstrap-link ws://127.0.0.1:4334
# Check response quality
python -m connectit p2p-request "Explain machine learning in simple terms" --model distilgpt2 --max-new-tokens 50 --bootstrap-link ws://127.0.0.1:4334Provider A (Fast, Expensive):
python -m connectit deploy-hf --model distilgpt2 --price-per-token 0.005 --host 0.0.0.0 --port 4001Provider B (Slow, Cheap):
python -m connectit deploy-hf --model distilgpt2 --price-per-token 0.001 --host 0.0.0.0 --port 4002 --bootstrap-link ws://localhost:4001Client requests automatically select the best provider:
# Will choose Provider B (cheaper)
python -m connectit p2p-request "Generate a short story" --model distilgpt2 --bootstrap-link ws://localhost:4001Deploy specialized models:
# Terminal 1: General text generation
python -m connectit deploy-hf --model distilgpt2 --price-per-token 0.002 --port 4001
# Terminal 2: Conversational AI
python -m connectit deploy-hf --model microsoft/DialoGPT-small --price-per-token 0.003 --port 4002 --bootstrap-link ws://127.0.0.1:4001
# Terminal 3: Code generation
python -m connectit deploy-hf --model microsoft/CodeGPT-small-py --price-per-token 0.004 --port 4003 --bootstrap-link ws://127.0.0.1:4001Use appropriate model for each task:
# General text
python -m connectit p2p-request "Write a product description" --model distilgpt2 --bootstrap-link ws://127.0.0.1:4001
# Conversation
python -m connectit p2p-request "How are you feeling today?" --model microsoft/DialoGPT-small --bootstrap-link ws://127.0.0.1:4001
# Code
python -m connectit p2p-request "def fibonacci(n):" --model microsoft/CodeGPT-small-py --bootstrap-link ws://127.0.0.1:4001You can use ConnectIT programmatically in your Python scripts:
import asyncio
from connectit.p2p_runtime import P2PNode
async def request_generation(prompt, model_name="distilgpt2", bootstrap_link=None):
"""Request text generation programmatically."""
node = P2PNode(host="127.0.0.1", port=0)
await node.start()
if bootstrap_link:
await node.connect_bootstrap(bootstrap_link)
# Wait for provider discovery
await asyncio.sleep(2)
# Find the best provider
best = node.pick_provider(model_name)
if not best:
print(f"No provider found for model: {model_name}")
return None
provider_id, _ = best
result = await node.request_generation(
provider_id,
prompt,
max_new_tokens=32,
model_name=model_name
)
await node.stop()
return result
# Usage
result = asyncio.run(request_generation(
"Hello world",
model_name="distilgpt2",
bootstrap_link="ws://127.0.0.1:4001"
))
print(result)Batch Processing:
import asyncio
from connectit.p2p_runtime import P2PNode
async def batch_generate(prompts, model_name="distilgpt2", bootstrap_link=None):
"""Generate text for multiple prompts."""
node = P2PNode(host="127.0.0.1", port=0)
await node.start()
if bootstrap_link:
await node.connect_bootstrap(bootstrap_link)
await asyncio.sleep(2) # Discovery time
results = []
for prompt in prompts:
best = node.pick_provider(model_name)
if best:
provider_id, _ = best
result = await node.request_generation(provider_id, prompt, model_name=model_name)
results.append({"prompt": prompt, "result": result})
else:
results.append({"prompt": prompt, "result": None})
await node.stop()
return results
# Usage
prompts = ["Hello", "How are you?", "Tell me a story"]
results = asyncio.run(batch_generate(prompts, bootstrap_link="ws://127.0.0.1:4001"))
for item in results:
print(f"Prompt: {item['prompt']}")
print(f"Result: {item['result']}")
print("---")Web Service Integration:
from flask import Flask, request, jsonify
import asyncio
from connectit.p2p_runtime import P2PNode
app = Flask(__name__)
@app.route('/generate', methods=['POST'])
def generate_text():
data = request.json
prompt = data.get('prompt')
model = data.get('model', 'distilgpt2')
bootstrap_link = data.get('bootstrap_link')
async def _generate():
node = P2PNode(host="127.0.0.1", port=0)
await node.start()
if bootstrap_link:
await node.connect_bootstrap(bootstrap_link)
await asyncio.sleep(2)
best = node.pick_provider(model)
if not best:
return None
provider_id, _ = best
result = await node.request_generation(provider_id, prompt, model_name=model)
await node.stop()
return result
result = asyncio.run(_generate())
return jsonify({'result': result})
if __name__ == '__main__':
app.run(debug=True)ConnectIT supports any Hugging Face Causal Language Model. Popular choices include:
- GPT-2 family:
gpt2,gpt2-medium,gpt2-large,gpt2-xl - DistilGPT-2:
distilgpt2(smaller, faster) - DialoGPT:
microsoft/DialoGPT-small,microsoft/DialoGPT-medium,microsoft/DialoGPT-large - CodeGPT:
microsoft/CodeGPT-small-py - GPT-Neo:
EleutherAI/gpt-neo-125M,EleutherAI/gpt-neo-1.3B - Custom models: Any compatible model from Hugging Face Hub
- Choose appropriate pricing: Set
--price-per-tokenbased on model size and computational cost - Resource considerations: Larger models require more memory and compute time
- Network setup: Ensure your host/port is accessible to other network participants
- Model caching: First deployment will download the model; subsequent runs use cached version
Custom model with specific configuration:
# Deploy a larger model with higher pricing
python -m connectit deploy-hf \
--model EleutherAI/gpt-neo-1.3B \
--price-per-token 0.01 \
--host 0.0.0.0 \
--port 4001 \
--bootstrap-link ws://bootstrap.mynetwork.com:4001Multiple model deployment:
You can run multiple instances on different ports to serve different models:
# Terminal 1: Deploy DistilGPT-2
python -m connectit deploy-hf --model distilgpt2 --price-per-token 0.001 --port 4001
# Terminal 2: Deploy GPT-2 Medium
python -m connectit deploy-hf --model gpt2-medium --price-per-token 0.005 --port 4002
# Terminal 3: Deploy DialoGPT
python -m connectit deploy-hf --model microsoft/DialoGPT-medium --price-per-token 0.003 --port 4003Common Issues:
- Command not found: Use
python -m connectitinstead ofconnectitif the command is not in PATH - Model download fails: Ensure internet connection and sufficient disk space
- No providers found: Check bootstrap-link and ensure at least one provider is running
- Port conflicts: Use different ports for multiple deployments
- Memory issues: Use smaller models like
distilgpt2for limited resources - Connection timeout: Wait a few seconds after starting providers before making requests
- Concurrency errors: Fixed in latest version - providers now handle multiple simultaneous requests
Dependencies:
- Core functionality:
typer,rich,websockets,numpy - Hugging Face models:
transformers,torch - Full features: Install with
pip install -e .[all]
Performance Tips:
- Use GPU-enabled PyTorch for faster inference on compatible hardware
- Choose model size based on available system resources
- Consider network latency when selecting bootstrap peers
- Monitor system resources during model deployment
- Start with
distilgpt2for testing - it's fast and lightweight - Use
--max-new-tokensto control response length and generation time - Multiple providers of the same model create automatic load balancing
This is a prototype implementation. See license file for details.