fix: replace metadata-based offset compensation with CBR byte-count math by hcgiub001 · Pull Request #468 · rany2/edge-tts

hcgiub001 · 2026-03-17T10:23:10Z

Summary

Replace the inter-chunk offset_compensation logic with exact arithmetic
derived from the actual MP3 byte count. This eliminates cumulative subtitle
timing drift on long texts (~3 min drift at ~4 hours).

Problem

Communicate.__stream currently derives each chunk's offset baseline from the
previous chunk's last metadata offset + a hardcoded 8_750_000-tick (~875 ms)
padding constant. Three things cause this to accumulate unbounded error:

Microsoft's integer-overflow bug — raw Offset values in
audio.metadata become unreliable on long texts, so
last_duration_offset is incorrect as source material.
Variable AI pauses — the silence between chunks depends on content,
punctuation, and prosody. No single constant can model it.
Circular error compounding — each chunk's compensation is derived from
the already-compensated previous chunk. Per-chunk errors feed forward.

Fix

The output format audio-24khz-48kbitrate-mono-mp3 is 48 kbps CBR.
For any constant-bitrate stream, the byte-to-duration conversion is exact:

ticks = total_bytes × 8 × 10,000,000 ÷ 48,000

Every WebSocket binary message — including encoded silence from the AI's
variable pauses — is counted. The "unpredictable pauses" that defeat the
fixed-constant approach are automatically and precisely accounted for because
they exist as real MP3 frames in the audio payload.

What changes

File	Change
`constants.py`	Add `TICKS_PER_SECOND = 10_000_000` and `MP3_BITRATE_BPS = 48_000`
`typing.py`	Add `chunk_audio_bytes` and `cumulative_audio_bytes` to `CommunicateState`
`communicate.py`	Count audio bytes per chunk; at `turn.end`, compute `offset_compensation` from cumulative byte count instead of metadata + constant

What does NOT change

Intra-chunk word/sentence boundary offsets still come from Microsoft's
metadata (accurate within a single chunk).
SubMaker, save(), stream() API — all unchanged.
last_duration_offset is still set for backward compatibility / debugging.

Why this is correct

Property	Old approach	New approach
Inter-chunk offset source	Last metadata offset + 875 ms constant	Actual MP3 bytes × exact CBR arithmetic
Affected by MS integer overflow	Yes	No — metadata not used for accumulation
Handles variable AI pauses	No	Yes — silence is encoded in the bytes
Error accumulation	Compounds per chunk	Zero (byte count is independent)
Mathematical guarantee	None	Exact for CBR; integer division, no rounding

The only scenario where this could break is if Microsoft switched to VBR
encoding, which would require a different outputFormat string — trivially
detectable.

Testing

Validated on texts producing up to 50 minutes of continuous audio with
real-time word-boundary tracking against actual playback position.
Zero observable drift. SRT timestamps remain synchronized with the audio
throughout.

The inter-chunk offset_compensation accumulated timing drift on long texts because it derived each chunk's baseline from Microsoft's reported metadata offsets plus a hardcoded 8,750,000-tick padding constant. The output format is audio-24khz-48kbitrate-mono-mp3 (48 kbps CBR). For any CBR stream the relationship between byte count and duration is exact: ticks = total_bytes * 8 * 10_000_000 // 48_000. Every binary audio message — including encoded silence from the AI's variable inter-sentence pauses — is counted. This replaces the metadata-domain accumulation that was vulnerable to: - Microsoft's integer-overflow bug in long-text Offset values - Variable AI pause lengths that no single constant can model - Compounding per-chunk errors across dozens of chunks Intra-chunk word/sentence boundary offsets from Microsoft's metadata are still used as-is (accurate within a single chunk). Only the inter-chunk compensation is changed. Fixes cumulative ~3-minute drift observed at ~4 hours of audio.

rany2 · 2026-03-21T18:54:36Z

Why close this?

hcgiub001 · 2026-03-22T17:10:03Z

thought you wasn't interested since it's been days

hcgiub001 · 2026-03-22T21:54:18Z

thanks for merging, sorry I caused you issues

hcgiub001 force-pushed the fix-timing-drift-byte-count branch 2 times, most recently from b81f2e6 to b6f76d0 Compare March 17, 2026 11:43

hcgiub001 force-pushed the fix-timing-drift-byte-count branch from b6f76d0 to cafe9a5 Compare March 17, 2026 11:45

hcgiub001 closed this Mar 21, 2026

rany2 reopened this Mar 21, 2026

rany2 merged commit 9965046 into rany2:master Mar 22, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: replace metadata-based offset compensation with CBR byte-count math#468

fix: replace metadata-based offset compensation with CBR byte-count math#468
rany2 merged 1 commit intorany2:masterfrom
hcgiub001:fix-timing-drift-byte-count

hcgiub001 commented Mar 17, 2026

Uh oh!

rany2 commented Mar 21, 2026

Uh oh!

hcgiub001 commented Mar 22, 2026

Uh oh!

Uh oh!

hcgiub001 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hcgiub001 commented Mar 17, 2026

Summary

Problem

Fix

What changes

What does NOT change

Why this is correct

Testing

Uh oh!

rany2 commented Mar 21, 2026

Uh oh!

hcgiub001 commented Mar 22, 2026

Uh oh!

Uh oh!

hcgiub001 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants