Add turbo support and prepare for CRAN by TroyHernandez · Pull Request #3 · cornball-ai/chatterbox

TroyHernandez · 2026-04-07T03:28:42Z

Summary

Adds turbo model support (GPT-2 backbone, MeanFlow decoder)
Prepares package for CRAN submission: 0 errors, 0 warnings, 1 harmless NOTE
Drops serve_chatterbox() since gpu.ctl is not on CRAN
Adds tinytest smoke tests
Slims tarball from 1.7GB to 98K via aggressive .Rbuildignore
Bumps version to 0.1.0

Test plan

R CMD check passes
tinytest suite passes (19 tests)
Manual GPU smoke test of turbo path
Address chris-english issues Feature Request - tts_to_file - messages to df #1 and tts_to_file text mitigation strategies - mixed case Upper/lower texts #2 before CRAN submission

- GPT-2 backbone (24 layers, 350M params) as alternative to Llama (30 layers, 520M) - MeanFlow single-step decoder (2 steps) instead of 10-step CFM - GPT-2 BPE tokenizer (50,276 vocab) in pure R - Turbo download infrastructure from ResembleAI/chatterbox-turbo - Turbo T3 inference: no CFG, top_k + repetition_penalty sampling - HTTP serve endpoint with turbo parameter - Fix DESCRIPTION authorship for CRAN (Resemble AI as copyright holder)

- Bump version to 0.1.0 - Fix DESCRIPTION: real email, ORCID, URL/BugReports, proper Description - Drop serve_chatterbox() and gpu.ctl Suggests dep (gpu.ctl not on CRAN) - Add tinytest smoke tests for chatterbox(), create_mel_filterbank(), compute_mel_spectrogram() - Add PACKAGE = "chatterbox" to .Call("cpp_t3_decode") - Add simplermarkdown vignette engine header to performance.md - Aggressive .Rbuildignore: tarball goes from 1.7GB to 98K - Regenerate Rd files via tinyrox::document() (resolves all codoc and undocumented arg WARNINGs) R CMD check: 0 errors, 0 warnings, 1 harmless NOTE.

* Add Python fidelity review punch list and lessons Full review of the R port against chatterbox-tts 0.1.4 (container reference), five subsystem comparisons. 32 findings, 6 high severity. See tasks/python-review-2026-06-11.md. * Text front-end parity with Python chatterbox-tts 0.1.4 - generate() now applies punc_norm() unconditionally before tokenizing, matching tts.py. The standard (non-turbo) path previously skipped it, losing whitespace collapse, first-letter capitalization, punctuation rewrites, and the trailing period (a strong EOS cue; relates to #1). - punc_norm() trims leading whitespace like Python's ' '.join(split()). - tokenize_text() gains an added-token extraction pass mirroring HF tokenizers: [SPACE], [laughter], [sigh], etc. now tokenize atomically instead of being spelled out letter by letter (relates to #5). - Fixed BPE corruption when a sequence fully merges to one token (e.g. 'th' produced [40, UNK, dup] instead of [40]). - Non-space whitespace is dropped but still separates words, so BPE can no longer merge across a removed tab/newline. - load_tokenizer() now stops (not warns) on missing [START]/[STOP], matching the Python assert. Token streams verified byte-identical against the chatterbox-tts 0.1.4 container on mixed sentences, added tokens, and whitespace edge cases. * Sampling parity with Python chatterbox-tts 0.1.4 - Repetition penalty is now sign-dependent (HF semantics): positive logits divided, negative multiplied. The old divide-only code made repeated tokens with negative logits MORE likely, under-damping the degenerate loops behind issue #1. Fixed in pure-R, traced, and C++ variants (turbo already had it right). - Extracted .sample_speech_token(), shared by the pure-R and traced loops so their sampling semantics cannot drift apart again. - top_p now defaults to 1.0 (disabled) everywhere, matching Python's generate(); previously 0.9 in generate() and 0.95 in t3_inference. - generate() gains min_p and now actually forwards min_p and repetition_penalty to the standard path (they were silently ignored). - Top-p keeps the threshold-crossing token (HF TopPLogitsWarper shifts the mask right) in the R and turbo samplers; C++ already did this. - Repetition-penalty set now includes the real BOS row in all variants; r/traced previously penalized valid speech token 6560 forever due to mixed 0/1-indexing, and cpp/turbo-loop penalized no BOS at all. - Runaway guard in all variants: the same token sampled 3x in a row stops generation with eos_found = FALSE and a warning (Python's alignment analyzer forces EOS at 2x; English-only Python 0.1.4 ships no guard and can emit 40s of garbage - issue #1). - C++ decoder reads rms_eps from the llama config instead of hardcoding. 7 new sampler unit tests (sign-dependence, crossing token, min-p). * Conditioning and numeric parity with Python chatterbox-tts 0.1.4 Conditioning (reference audio path): - New windowed-sinc resampler (R/resample.R), a port of torchaudio's _get_sinc_resample_kernel/_apply_sinc_resample_kernel; replaces linear interpolation in resample_audio(), which aliased reference features. Validated vs torchaudio: max abs diff 2.2e-10. - New Kaldi fbank (R/kaldi_fbank.R), a port of torchaudio.compliance.kaldi.fbank (povey window, preemphasis 0.97, HTK mel scale, power spectrum, snip_edges). CAMPPlus now sees the features it was trained on instead of a librosa-style mel at half the log scale. Validated vs torchaudio: max abs diff 4.9e-09. - Reference truncation parity: S3Gen prompt capped at 10 s (DEC_COND_LEN) and tokenizer conditioning prompt at 6 s (ENC_COND_LEN) in create_voice_embedding(), as in tts.py prepare_conditionals. - embed_ref() reconciles mel/token prompt lengths (trim tokens to mel_len %/% 2) and the flow uses the actual prompt mel length for the conditioning region, fixing an off-by-one-frame boundary on refs that are not multiples of 40 ms. - Voice encoder partials now match embeds_from_wavs defaults: frame_step 77 (rate = 1.3), min_coverage 0.8 extra zero-padded partial, and leading/trailing silence trim (librosa.effects.trim, top_db = 20, ported as trim_silence()). Numeric/small parity: - CFG unconditional row zeroed BEFORE positional embeddings (t3.py keeps text positions in the uncond branch). - Prefill ends in two BOS frames like t3.py; removed the dead bos_emb block that hinted at the lost intent. - CFM transformer feed-forward uses exact GELU (diffusers default), not the tanh approximation. - CFM no longer draws a wasted torch_randn_like on the standard path (RNG state + allocation churn). - autocast now defaults to FALSE: the Python reference runs float32 everywhere (S3Gen fp16 is explicitly off upstream); fp16 is opt-in. - conds.pt dropped from downloads: it is a nested torch pickle R torch cannot read, and the R API requires a reference voice (~105 MB saved). Validation scripts: scripts/test_kaldi_resample.R + scripts/save_kaldi_resample_ref.py (container reference). * Low-severity parity fixes and divergence docs - drop_invalid_tokens() slices first-SOS..first-EOS before filtering (Python semantics); post-EOS garbage no longer survives token-cap runs. - make_pad_mask() broadcasts correctly for batch > 1. - chatterbox() falls back to CPU with a warning when CUDA/MPS is unavailable, like Python from_pretrained. - GPT-2 pre-tokenizer regex uses \p{L}/\p{N} instead of POSIX classes. - README documents deliberate divergences: no Perth watermark (with disclosure), no builtin default voice, reliability extras, backend token caps, unported modules. - New NEWS.md summarizing the fidelity review. - tests/tinytest.R redirects R_USER_{CACHE,DATA,CONFIG}_DIR during checks (CRAN home-filespace hygiene). * rformat + document * Raise runaway guard to 10x repeats; GPU validation scripts; Rd fix GPU validation showed healthy generations (silence, laughter) tripping the 3x repeated-token guard; 10 consecutive identical FSQ codes (400 ms) only occur in degenerate loops. All validation cases now emit EOS and track the container reference. Also fixes the t3_inference Rd orphaned by the sampler helper insertion (stale top_p default) and ignores working drafts. * Correct NEWS: runaway guard threshold is 10x, not 3x * Bump version to 0.1.0.1 * Ignore local task notes * Remove debug artifacts and stale port-era docs - outputs/ safetensors debug dumps (13.9 MB) untracked; the dir was already gitignored and every dump is regenerable from scripts/ - test_output.wav, validation_status.csv: January debugging scratch, superseded by CLAUDE.md's validation table - PseudoCode.md contained parameters CLAUDE.md explicitly corrected; JIT_TRACE_OPTIMIZATION.md superseded by vignettes/performance.md - fyi.md was empty - ignore files tidied to match * Prune one-off debugging scripts; remove empty ref_audio dir entry Deleted the January port-debugging one-offs (debug_*, trace_*, profile_*, explore_*, check_*; 80 files, all in git history). Kept the component validation suite (test_*), the container reference generators (save_*), comparison scripts, and the parity validation scripts. * Restore useDynLib dropped in PR #3; guard with a test tinyrox::document() does not support @useDynLib and silently drops the manual useDynLib(chatterbox, .registration = TRUE) line from NAMESPACE. PR #3's document() run did exactly that, so backend = 'cpp' has errored with 'cpp_t3_decode not available' ever since - unnoticed because no test exercised the C++ path. New test_cpp_backend.R fails loudly if the registration ever disappears again. * Add backend benchmark, profiling, and GC-tuning scripts Tooling behind the June 2026 performance investigation: per-backend ms/token benchmark, profvis+debrief profiles (which found 91% of pure-R wall time in R GC triggered by torch's allocator policy), and the GC-threshold tuning grid. Findings documented in the cornelius vault. * Guard sampler tests against missing lantern CI installs the torch package but not its libtorch backend; requireNamespace alone passes there and the first torch_tensor() call errors. Use torch_is_installed() like the other torch-dependent tests.

TroyHernandez added 3 commits February 20, 2026 05:56

Fix torch_randn dtype: use float32 instead of Long from speech_tokens

ea2672f

TroyHernandez merged commit 7e923be into main Apr 7, 2026
4 checks passed

TroyHernandez deleted the turbo-support branch April 7, 2026 03:28

TroyHernandez mentioned this pull request Jun 11, 2026

Python fidelity: full parity review fixes (32 findings) #6

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add turbo support and prepare for CRAN#3

Add turbo support and prepare for CRAN#3
TroyHernandez merged 3 commits into
mainfrom
turbo-support

TroyHernandez commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TroyHernandez commented Apr 7, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant