VoiceGuard is a Minecraft Paper plugin for voice chat moderation that monitors Simple Voice Chat audio streams, detects inappropriate language using OpenAI Whisper speech-to-text, and stores flagged recordings with metadata.
┌─────────────────────────────────────────────────────────┐
│ Minecraft Server │
│ ┌───────────────────────────────────────────────────┐ │
│ │ VoiceGuard Plugin (Java) │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Simple Voice Chat API Integration │ │ │
│ │ │ - Listen to audio packets │ │ │
│ │ │ - Per-player audio streams │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Privacy Consent System │ │ │
│ │ │ - Join event handler │ │ │
│ │ │ - Accept/Deny popup (chat/form) │ │ │
│ │ │ - Persistent storage (player data) │ │ │
│ │ │ - /privacy opt-out command │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Circular Buffer (30s rolling) │ │ │
│ │ │ - ArrayDeque<AudioChunk> per player │ │ │
│ │ │ - Triggered: extend +15s recording │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ HTTP Client │ │ │
│ │ │ - POST raw Opus audio to Python backend │ │ │
│ │ │ - Async/concurrent requests │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
│ HTTP POST
│ (raw Opus audio + metadata)
▼
┌─────────────────────────────────────────────────────────┐
│ Python Backend Server │
│ ┌───────────────────────────────────────────────────┐ │
│ │ FastAPI Application │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ /analyze endpoint │ │ │
│ │ │ - Receive audio file │ │ │
│ │ │ - Decode Opus using FFmpeg │ │ │
│ │ │ - Convert to Whisper-compatible format │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Whisper STT Service │ │ │
│ │ │ - Load model (cached) │ │ │
│ │ │ - Transcribe audio │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Bad Word Detector │ │ │
│ │ │ - Load config (bad_words.txt) │ │ │
│ │ │ - Case-insensitive matching │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Storage Service │ │ │
│ │ │ - Save audio file (if detected) │ │ │
│ │ │ - Insert DB record │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────┐ ┌─────────────────────────┐ │
│ │ Database │ │ Audio Storage │ │
│ │ (SQLite) │ │ /audio_files/ │ │
│ │ - detections │ │ - 2026-01-19/ │ │
│ │ - players │ │ - uuid_timestamp.wav│ │
│ └────────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
- Java 21
- Paper API 1.20.6
- Simple Voice Chat API - Voice packet capture
- OkHttp3 - HTTP client for backend communication
- Gson - JSON serialization
- Maven - Build tool
- Python 3.10+
- FastAPI - Web framework
- Uvicorn - ASGI server
- OpenAI Whisper - Speech-to-text (base/small model recommended)
- PyTorch - Whisper dependency
- FFmpeg-python - Opus audio decoding
- Pydub - Audio processing
- SQLAlchemy - ORM
- SQLite - Database (or PostgreSQL for production)
- python-multipart - File uploads
Add required dependencies:
<!-- Simple Voice Chat API -->
<dependency>
<groupId>de.maxhenkel.voicechat</groupId>
<artifactId>voicechat-api</artifactId>
<version>2.5.0</version>
<scope>provided</scope>
</dependency>
<!-- HTTP Client -->
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.12.0</version>
</dependency>
<!-- JSON -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10.1</version>
</dependency>Components:
PrivacyManager.java- Handles consent trackingPrivacyCommand.java-/privacy opt-outcommandPlayerJoinListener.java- Show consent popup on joinprivacy.yml- Persistent storage of player consent
Workflow:
- Player joins server
- Check if consent recorded in
privacy.yml - If no consent:
- Show consent popup (using chat messages or form)
- Options: Accept or Deny
- Accept: Record consent, allow play
- Deny: Kick player with privacy message
- If consent exists: Allow play
/privacy opt-outcommand: Kick player with message "You must accept voice monitoring to play on this server"
Data Storage:
# privacy.yml
players:
550e8400-e29b-41d4-a716-446655440000:
consented: true
timestamp: "2026-01-19T10:30:00Z"Components:
- Implement
VoicechatPlugininterface - Register with Simple Voice Chat API
- Create
VoiceChatListener.javafor audio packet events - Hook into
ServerReceiveEventor equivalent
Key Tasks:
- Detect when player speaks
- Extract raw Opus audio bytes from packets
- Track active voice sessions per player
Components:
AudioBuffer.java- Thread-safe circular buffer per playerAudioChunk.java- Data class (timestamp, bytes, player UUID)
Implementation:
- Use
ConcurrentLinkedDeque<AudioChunk>for thread safety - Maintain 30 seconds of rolling audio history
- Calculate buffer size:
sampleRate * channels * bytesPerSample * 30 - When bad word detected: Stop circular buffering, extend recording +15s
- Combine 30s buffer + 15s extension → 45s total clip
Components:
BackendClient.java- OkHttp wrapper- Async POST requests to Python backend
Request Format:
POST /analyze
Content-Type: multipart/form-data
{
"audio": <binary Opus file>,
"player_uuid": "550e8400-e29b-41d4-a716-446655440000",
"player_name": "PlayerName",
"timestamp": "2026-01-19T10:30:00Z",
"server_id": "main-server"
}Response Format:
{
"detected": true/false,
"word": "badword" or null,
"transcript": "full transcript text",
"confidence": 0.95
}config.yml:
voiceguard:
backend:
url: "http://localhost:8000"
timeout: 30
retry_attempts: 3
buffer:
pre_trigger_seconds: 30
post_trigger_seconds: 15
privacy:
consent_message: |
&e&l[VoiceGuard] &r&7This server monitors voice chat for safety.
&7Your voice may be recorded and analyzed if inappropriate language is detected.
&7Type &a/accept&7 to continue or &c/deny&7 to leave.
kick_message: "&cYou must accept voice monitoring to play on this server."
enabled_worlds:
- world
- world_nether
- world_the_end
debug: falseDirectory Structure:
voiceguard-backend/
├── main.py # FastAPI application
├── requirements.txt # Python dependencies
├── .env # Environment config
├── config/
│ ├── bad_words.txt # Bad words list
│ └── settings.py # App settings
├── services/
│ ├── whisper_service.py # STT processing
│ ├── detector_service.py # Bad word detection
│ └── storage_service.py # DB & file storage
├── models/
│ └── database.py # SQLAlchemy models
├── routes/
│ └── analyze.py # API endpoints
└── audio_files/ # Storage directory
└── YYYY-MM-DD/
fastapi==0.109.0
uvicorn[standard]==0.27.0
openai-whisper==20231117
torch==2.1.2
torchaudio==2.1.2
pydub==0.25.1
ffmpeg-python==0.2.0
python-multipart==0.0.6
sqlalchemy==2.0.25
aiosqlite==0.19.0
python-dotenv==1.0.0System Dependencies:
# FFmpeg (required for audio decoding)
apt-get install ffmpeg # Linux
brew install ffmpeg # macOS
choco install ffmpeg # Windowsmain.py:
from fastapi import FastAPI, UploadFile, File, Form
from services.whisper_service import WhisperService
from services.detector_service import DetectorService
from services.storage_service import StorageService
import uvicorn
app = FastAPI(title="VoiceGuard Backend")
# Initialize services on startup
@app.on_event("startup")
async def startup():
app.state.whisper = WhisperService(model_name="base")
app.state.detector = DetectorService()
app.state.storage = StorageService()
@app.post("/analyze")
async def analyze_audio(
audio: UploadFile = File(...),
player_uuid: str = Form(...),
player_name: str = Form(...),
timestamp: str = Form(...),
server_id: str = Form(...)
):
# 1. Save uploaded Opus file temporarily
# 2. Decode Opus → WAV using FFmpeg
# 3. Transcribe with Whisper
# 4. Check for bad words
# 5. If detected: Save audio + log to DB
# 6. Return result
pass
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)services/whisper_service.py:
import whisper
import torch
class WhisperService:
def __init__(self, model_name="base"):
# Check GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Loading Whisper model '{model_name}' on {device}...")
self.model = whisper.load_model(model_name, device=device)
print("Whisper model loaded successfully.")
def transcribe(self, audio_path: str) -> dict:
"""
Transcribe audio file to text.
Returns: {
"text": "full transcript",
"segments": [...],
"language": "en"
}
"""
result = self.model.transcribe(audio_path)
return resultModel Selection Guide:
tiny- Fastest, lowest accuracy (~1 sec/clip)base- Recommended - Good balance (~2-3 sec/clip)small- Better accuracy (~5-7 sec/clip)medium- High accuracy (~10-15 sec/clip)large- Best accuracy, very slow (~30+ sec/clip)
services/audio_decoder.py:
import ffmpeg
import os
def decode_opus_to_wav(opus_path: str, wav_path: str):
"""
Decode Opus audio to WAV format using FFmpeg.
"""
try:
(
ffmpeg
.input(opus_path)
.output(wav_path, acodec='pcm_s16le', ar='16000', ac='1')
.overwrite_output()
.run(quiet=True)
)
return wav_path
except ffmpeg.Error as e:
print(f"FFmpeg error: {e.stderr.decode()}")
raiseservices/detector_service.py:
import re
class DetectorService:
def __init__(self, bad_words_file="config/bad_words.txt"):
with open(bad_words_file, 'r') as f:
self.bad_words = [word.strip().lower() for word in f.readlines()]
def check_text(self, text: str) -> dict:
"""
Check if text contains bad words.
Returns: {
"detected": bool,
"word": str or None,
"matches": list
}
"""
text_lower = text.lower()
matches = []
for word in self.bad_words:
# Use word boundaries to avoid partial matches
pattern = r'\b' + re.escape(word) + r'\b'
if re.search(pattern, text_lower):
matches.append(word)
return {
"detected": len(matches) > 0,
"word": matches[0] if matches else None,
"matches": matches
}config/bad_words.txt:
badword1
badword2
offensive_term
models/database.py:
from sqlalchemy import Column, Integer, String, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from datetime import datetime
Base = declarative_base()
class Detection(Base):
__tablename__ = "detections"
id = Column(Integer, primary_key=True, autoincrement=True)
player_uuid = Column(String(36), nullable=False)
player_name = Column(String(16), nullable=False)
detected_word = Column(String(50), nullable=False)
full_transcript = Column(Text, nullable=False)
timestamp = Column(DateTime, default=datetime.utcnow)
audio_file_path = Column(String(255), nullable=False)
server_id = Column(String(50), nullable=False)
def __repr__(self):
return f"<Detection(player={self.player_name}, word={self.detected_word})>"Database Initialization:
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
DATABASE_URL = "sqlite+aiosqlite:///voiceguard.db"
engine = create_async_engine(DATABASE_URL, echo=True)
AsyncSessionLocal = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
async def init_db():
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)services/storage_service.py:
import os
from datetime import datetime
from models.database import Detection, AsyncSessionLocal
class StorageService:
def __init__(self, storage_dir="audio_files"):
self.storage_dir = storage_dir
os.makedirs(storage_dir, exist_ok=True)
def get_file_path(self, player_uuid: str, timestamp: str) -> str:
"""Generate organized file path: audio_files/YYYY-MM-DD/uuid_timestamp.wav"""
date = datetime.now().strftime("%Y-%m-%d")
date_dir = os.path.join(self.storage_dir, date)
os.makedirs(date_dir, exist_ok=True)
filename = f"{player_uuid}_{timestamp.replace(':', '-')}.wav"
return os.path.join(date_dir, filename)
async def save_detection(self, detection_data: dict):
"""Save detection to database"""
async with AsyncSessionLocal() as session:
detection = Detection(**detection_data)
session.add(detection)
await session.commit()-
Test privacy consent:
- New player joins → popup appears
- Test accept path → player stays
- Test deny path → player kicked
- Rejoin → no popup (consent remembered)
- Test
/privacy opt-outcommand
-
Test audio capture:
- Player speaks in voice chat
- Verify audio buffering (30s rolling)
- Check console logs for packet capture
-
Test backend communication:
- Trigger audio analysis (speak trigger word)
- Verify HTTP POST to Python backend
- Check backend receives Opus file correctly
-
Test Whisper transcription:
- Backend decodes Opus → WAV
- Whisper transcribes audio
- Verify transcript accuracy
-
Test bad word detection:
- Speak configured bad words
- Verify detection in transcript
- Check database record created
- Verify audio file saved with correct naming
-
Test false positives/negatives:
- Normal conversation → no detection
- Variations of bad words → should detect
- Multiple players speaking simultaneously: Ensure buffers don't interfere
- Backend load: Test multiple concurrent transcription requests
- Memory usage: Monitor Java heap with many players
- Whisper speed: Measure transcription time per model size
- Backend offline: Queue requests or log errors gracefully
- Whisper processing failure: Log error, don't crash
- Audio format issues: Validate before sending
- Database errors: Retry logic for transient failures
Java Plugin:
- Log consent actions (accept/deny)
- Log audio buffer events
- Log HTTP request/response status
- Log detection events to console
Python Backend:
- Request logging (player, timestamp)
- Transcription results
- Detection events with confidence scores
- Error stack traces
Java: Externalize all settings to config.yml
Python: Use .env file for backend URL, model selection, DB path
- Java: Use async HTTP client, dedicated thread pool for audio processing
- Python: Consider Redis queue for request buffering during high load
- Whisper: Load model once at startup, keep in memory
- GPU: Enable CUDA if available for 3-5x speed boost
Automatic cleanup policy:
# Delete audio files older than 30 days
def cleanup_old_files():
cutoff_date = datetime.now() - timedelta(days=30)
# Delete files and DB records older than cutoff_date- Consent management: Persistent storage, never ask twice
- Opt-out enforcement: Kick immediately with clear message
- Data access: Add command for players to request their data
- Transparency: Show privacy policy on first join
- README.md: Installation, configuration, usage
- API.md: Python backend API documentation
- PRIVACY.md: Privacy policy and data handling
- TROUBLESHOOTING.md: Common issues and solutions
/privacy opt-out- Opt out of voice monitoring (kicks player)
/voiceguard reload- Reload configuration/voiceguard stats- Show detection statistics/voiceguard export <player>- Export player's data/voiceguard clear <player>- Clear player's detection history
- Update
pom.xmlwith dependencies - Implement
PrivacyManagerfor consent tracking - Create
PlayerJoinListenerfor consent popup - Implement
/privacy opt-outcommand - Create
privacy.ymldata storage - Implement
VoicechatPlugininterface - Create
VoiceChatListenerfor audio packets - Build
AudioBuffercircular buffer system - Implement
BackendClientHTTP client - Create
config.ymlconfiguration - Add comprehensive logging
- Test privacy consent flow
- Test audio capture and buffering
- Test backend communication
- Setup project structure
- Create
requirements.txt - Implement FastAPI application (
main.py) - Create
WhisperServicefor STT - Implement Opus → WAV decoding
- Create
DetectorServicefor bad words - Setup
bad_words.txtconfiguration - Design database schema
- Implement
StorageServicefor DB and files - Create
/analyzeendpoint - Add error handling and logging
- Test Whisper transcription locally
- Test bad word detection
- Test file storage organization
- End-to-end testing with both components
- Performance testing with multiple players
- Error handling and recovery testing
- Privacy consent system testing
- Documentation (README, setup guides)
- Simple Voice Chat: Outputs Opus codec
- Java: Send raw Opus bytes to Python (no decoding in Java)
- Python: Decode Opus using FFmpeg before Whisper
- Whisper: Accepts WAV, MP3, FLAC, etc. (not Opus directly)
- 30s audio buffer per player: ~480KB per player (16kHz, mono, 16-bit)
- Whisper model size:
- Tiny: ~75MB
- Base: ~150MB
- Small: ~500MB
- Medium: ~1.5GB
- Large: ~3GB
- Recommendation: Start with
basemodel, upgrade if accuracy insufficient
- GDPR: Requires explicit consent (✅ implemented)
- Data retention: Auto-delete old recordings (recommended: 30 days)
- Right to access: Allow players to request their data
- Right to deletion: Allow players to request data deletion
- Transparency: Clear privacy policy and disclosure
- Web Dashboard: View detections, statistics, audio playback
- Machine Learning: Train custom model on server-specific language
- Multiple Languages: Support non-English detection
- Severity Levels: Different actions for different word categories
- Appeal System: Players can appeal false positives
- Integration: Discord notifications for staff on detections
- Analytics: Trend analysis, repeat offenders tracking
- Whitelist: Trusted players exempt from monitoring
- Simple Voice Chat API: https://modrepo.de/minecraft/voicechat/api
- OpenAI Whisper: https://github.com/openai/whisper
- FastAPI Docs: https://fastapi.tiangolo.com/
- Paper API Docs: https://docs.papermc.io/