This implementation adds structured JSON logging and Prometheus metrics to the Veritix Python service. The solution provides:
- Structured JSON Logging - Consistent, parseable log format with request IDs and metadata
- Prometheus Metrics - Comprehensive service metrics for monitoring and alerting
- Request Tracking - Automatic request ID generation and propagation
- Dashboard Integration - Ready-to-use Grafana dashboard configurations
- JSON-formatted logs with consistent structure
- Automatic request ID generation and tracking
- Context-aware logging with metadata
- Configurable log levels
- Request/response logging with timing information
- HTTP request metrics (count, duration, in-progress)
- WebSocket connection metrics
- Chat message metrics
- Business operation metrics (ETL, QR codes, fraud detection)
- Custom application metrics
- RequestIDMiddleware - Generates and tracks request IDs
- MetricsMiddleware - Collects and exposes Prometheus metrics
- Automatic correlation of logs and metrics
src/logging_config.py- Core logging and metrics implementationdocs/monitoring_dashboard.md- Dashboard configuration and setup instructionstests/test_logging_metrics.py- Comprehensive test suite
src/main.py- Added logging/metrics middleware and updated endpointssrc/websocket.py- Updated to use structured loggingsrc/manager.py- Updated to use structured loggingsrc/chat.py- Updated to use structured logging and metricssrc/etl.py- Added logging and metrics to ETL operationsrequirements.txt- Addedprometheus-clientdependency
# Logging level (DEBUG, INFO, WARNING, ERROR)
LOG_LEVEL=INFO
# Request ID header name (optional)
REQUEST_ID_HEADER=X-Request-IDAll logs are formatted as JSON with the following structure:
{
"timestamp": "2024-01-15T10:30:45.123456",
"level": "INFO",
"logger": "veritix",
"message": "User connected to chat",
"request_id": "a1b2c3d4-e5f6-7890-g1h2-i3j4k5l6m7n8",
"module": "chat",
"function": "connect",
"line": 45,
"extra_data": {
"conversation_id": "conv_123",
"user_id": "user_456",
"total_connections": 2
}
}http_requests_total- Counter for HTTP requests by method, endpoint, statushttp_request_duration_seconds- Histogram for request durationshttp_requests_in_progress- Gauge for active requests
websocket_connections_total- Gauge for active WebSocket connections
chat_messages_total- Counter for chat messages by sender typechat_conversations_active- Gauge for active conversations
etl_jobs_total- Counter for ETL jobs by statusticket_scans_total- Counter for ticket scans by resultfraud_detections_total- Counter for fraud checks by rules triggeredqr_generations_total- Counter for QR code generationsqr_validations_total- Counter for QR validations by result
from src.logging_config import log_info, log_error, log_warning
# Simple info log
log_info("User action completed")
# Log with metadata
log_info("Database operation", {"table": "users", "operation": "insert"})
# Error logging
log_error("Database connection failed", {"error": "Connection timeout"})from src.logging_config import CHAT_MESSAGES_TOTAL
# Increment chat message counter
CHAT_MESSAGES_TOTAL.labels(sender_type="user", conversation_id="conv123").inc()from src.logging_config import request_id_context
# Get current request ID
request_id = request_id_context.get()Run the logging and metrics tests:
# Run all tests
pytest tests/test_logging_metrics.py -v
# Run with coverage
pytest tests/test_logging_metrics.py --cov=src --cov-report=htmlscrape_configs:
- job_name: 'veritix'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
scrape_interval: 15sImport the JSON configuration from docs/monitoring_dashboard.md or use the provided examples.
Example alert rules for critical service health:
- High error rate (>5% 5xx errors)
- High latency (>2 seconds 95th percentile)
- No WebSocket connections
- Failed ETL jobs
- Log Volume: JSON logs are more verbose than traditional logs
- Metric Cardinality: Avoid high-cardinality labels in metrics
- Memory Usage: Prometheus metrics are kept in memory
- Request Overhead: Minimal overhead from middleware components
- Log Sanitization: Avoid logging sensitive information
- Metrics Exposure: Protect
/metricsendpoint in production - Request ID Generation: Use cryptographically secure random IDs
- Log Retention: Implement appropriate log retention policies
The logging works seamlessly with Docker container orchestration:
version: '3.8'
services:
veritix:
# ... existing config
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"The JSON format is compatible with Fluentd/Elasticsearch/FluentBit:
apiVersion: v1
kind: Pod
metadata:
annotations:
fluentd.log/format: "json"When migrating from existing logging:
- Existing logs: The old
logger.info()calls continue to work - Backward compatibility: Standard Python logging remains functional
- Gradual migration: Convert existing logging calls gradually
- Monitoring: Existing metrics/dashboarding may need updates
- Missing metrics: Verify
/metricsendpoint is accessible - JSON parsing errors: Check for non-serializable objects in extra_data
- High memory usage: Review metric cardinality and retention settings
- Performance impact: Monitor response times with high logging volumes
- Set
LOG_LEVEL=DEBUGfor verbose output - Check
/metricsendpoint for proper metric collection - Verify request ID propagation through microservices
- Use Prometheus query language for real-time metrics debugging
Consider implementing:
- Distributed tracing (OpenTelemetry) for complex request flows
- Centralized logging integration with ELK/EFK stack
- Advanced alerting with machine learning-based anomaly detection
- Log aggregation across multiple service instances
- Custom dashboards for specific business metrics