Vocalis is a powerful audio processing package featuring:
- Ultra-fast Whisper V3 Turbo Transcription
- Advanced Speaker Diarization
- Audio Analysis Tools
- Security Monitoring
- FastAPI Integration
- Transcription using Whisper V3 Turbo
- Speaker diarization with pyannote.audio and sherpa-onnx
- Speaker name identification
- Conversation summarization
- Topic extraction
- Detect potential security incidents in audio
- Specialized bar security monitoring
- Threat level assessment
- Incident reporting
- FastAPI integration for all functionality
- Gradio UI for interactive usage
- Command-line interface
pip install vocalispip install vocalis[gpu]pip install vocalis[dev]Vocalis provides a command-line interface for common tasks:
# Run the FastAPI server
python -m vocalis api --port 8000
# Run the Gradio UI
python -m vocalis ui
# Run security monitoring on a file
python -m vocalis security --input audio.flac --threat-level 2
# Run bar-specific security monitoring on a directory
python -m vocalis security --input ./examples/bar --barStart the API server:
python -m vocalis apiThen use the API endpoints:
POST /api/transcribe- Transcribe and diarize audioPOST /api/security/analyze- Analyze audio for security concernsPOST /api/analyze- Analyze audio characteristicsGET /api/models- Get available models
from vocalis.core.audio_pipeline import AudioProcessingPipeline
# Initialize pipeline
pipeline = AudioProcessingPipeline()
# Process audio
result = pipeline.process_audio(
audio_path="audio.flac",
task="transcribe",
num_speakers=2
)
# Access results
print(result["text"])
for segment in result["merged_segments"]:
print(f"{segment['speaker']}: {segment['text']}")from vocalis.security.security_monitor import SecurityMonitor
# Initialize security monitor
monitor = SecurityMonitor(output_dir="security_incidents", min_threat_level=2)
# Process audio file
incident = monitor.process_audio_file("audio.flac")
if incident:
print(f"Security incident detected: {incident.incident_type}")
print(f"Threat level: {incident.threat_level}/5")
print(f"Summary: {incident.summary}")This project builds upon several amazing technologies:
Apache License 2.0