feat(channels): add SIP voice channel with pyVoIP/LiveKit dual-mode, streaming STT/TTS, and multi-turn conversation support#3449
Conversation
|
Hi @shaohuaxi, thank you for your first Pull Request! 🎉 🙌 Join Developer CommunityThanks so much for your contribution! We'd love to invite you to join the official QwenPaw developer group! You can find the Discord and DingTalk group links under the "Developer Community" section on our docs page: We truly appreciate your enthusiasm—and look forward to your future contributions! 😊 We'll review your PR soon. |
|
Please format the code via |
333a566 to
2f26ac5
Compare
…streaming STT/TTS, and multi-turn conversation support
@xieyxclack I've run
|
|
@shaohuaxi Thank you for your contribution! It seems that sip does not support being configured through the console (or am I misunderstanding something?) Maybe you can refer to this pr (adding telegram channel) : #147 |


Description
│ Roadmap: QwenPaw Roadmap / Task List (#2291) — Task #15 "Support SIP protocol registration and voice call channel integration"
│ Related Issue: #3448
Add a SIP voice channel to QwenPaw, enabling real-time voice conversations via standard SIP phones and softphones (e.g., Linphone, MicroSIP, IP desk phones).
SIPChannel implements BaseChannel with a pluggable SipBackend protocol, supporting two modes:
BaseChannel
├── ConsoleChannel, DingTalkChannel, VoiceChannel (Twilio), ...
└── SIPChannel (this PR)
├── PyVoIPBackend (sip_mode="dev") — pure-Python, zero infra
└── LiveKitBackend (sip_mode="livekit") — production-grade
SIP Phone → [SIP/RTP] → Backend (pyVoIP or LiveKit SIP)
↓
SIPChannel (BaseChannel)
├─ STT (DashScope Paraformer streaming)
├─ Agent (_process)
└─ TTS (DashScope Sambert) → audio playback
Key capabilities:
concurrency deployment
conversion
Type of Change
Component(s) Affected
Changes Overview
New files (SIP-specific, self-contained)
Minimal modifications to upstream files
Upstream files NOT modified
Architecture Design: Why Dual-Track?
Pure Python cannot handle production SIP/RTP due to GIL limitations with jitter buffering and codec resampling. Industry leaders (OpenAI
Realtime API, LiveKit, Vapi, Retell) all adopt a "media gateway terminates SIP/RTP → clean audio stream → Python AI node" pattern.
This PR follows the same proven architecture while preserving QwenPaw's "zero-config, easy to start" philosophy:
logic.
Both tracks implement the same SipBackend protocol. Switching requires only changing sip_mode in config.
Checklist
Testing
Manual E2E testing performed:
agent reply, TTS playback, multi-turn conversation, call teardown.
Additional Notes