Add turbo support and prepare for CRAN#3
Merged
Merged
Conversation
- GPT-2 backbone (24 layers, 350M params) as alternative to Llama (30 layers, 520M) - MeanFlow single-step decoder (2 steps) instead of 10-step CFM - GPT-2 BPE tokenizer (50,276 vocab) in pure R - Turbo download infrastructure from ResembleAI/chatterbox-turbo - Turbo T3 inference: no CFG, top_k + repetition_penalty sampling - HTTP serve endpoint with turbo parameter - Fix DESCRIPTION authorship for CRAN (Resemble AI as copyright holder)
- Bump version to 0.1.0
- Fix DESCRIPTION: real email, ORCID, URL/BugReports, proper Description
- Drop serve_chatterbox() and gpu.ctl Suggests dep (gpu.ctl not on CRAN)
- Add tinytest smoke tests for chatterbox(), create_mel_filterbank(),
compute_mel_spectrogram()
- Add PACKAGE = "chatterbox" to .Call("cpp_t3_decode")
- Add simplermarkdown vignette engine header to performance.md
- Aggressive .Rbuildignore: tarball goes from 1.7GB to 98K
- Regenerate Rd files via tinyrox::document() (resolves all codoc and
undocumented arg WARNINGs)
R CMD check: 0 errors, 0 warnings, 1 harmless NOTE.
4 tasks
TroyHernandez
added a commit
that referenced
this pull request
Jun 11, 2026
* Add Python fidelity review punch list and lessons Full review of the R port against chatterbox-tts 0.1.4 (container reference), five subsystem comparisons. 32 findings, 6 high severity. See tasks/python-review-2026-06-11.md. * Text front-end parity with Python chatterbox-tts 0.1.4 - generate() now applies punc_norm() unconditionally before tokenizing, matching tts.py. The standard (non-turbo) path previously skipped it, losing whitespace collapse, first-letter capitalization, punctuation rewrites, and the trailing period (a strong EOS cue; relates to #1). - punc_norm() trims leading whitespace like Python's ' '.join(split()). - tokenize_text() gains an added-token extraction pass mirroring HF tokenizers: [SPACE], [laughter], [sigh], etc. now tokenize atomically instead of being spelled out letter by letter (relates to #5). - Fixed BPE corruption when a sequence fully merges to one token (e.g. 'th' produced [40, UNK, dup] instead of [40]). - Non-space whitespace is dropped but still separates words, so BPE can no longer merge across a removed tab/newline. - load_tokenizer() now stops (not warns) on missing [START]/[STOP], matching the Python assert. Token streams verified byte-identical against the chatterbox-tts 0.1.4 container on mixed sentences, added tokens, and whitespace edge cases. * Sampling parity with Python chatterbox-tts 0.1.4 - Repetition penalty is now sign-dependent (HF semantics): positive logits divided, negative multiplied. The old divide-only code made repeated tokens with negative logits MORE likely, under-damping the degenerate loops behind issue #1. Fixed in pure-R, traced, and C++ variants (turbo already had it right). - Extracted .sample_speech_token(), shared by the pure-R and traced loops so their sampling semantics cannot drift apart again. - top_p now defaults to 1.0 (disabled) everywhere, matching Python's generate(); previously 0.9 in generate() and 0.95 in t3_inference. - generate() gains min_p and now actually forwards min_p and repetition_penalty to the standard path (they were silently ignored). - Top-p keeps the threshold-crossing token (HF TopPLogitsWarper shifts the mask right) in the R and turbo samplers; C++ already did this. - Repetition-penalty set now includes the real BOS row in all variants; r/traced previously penalized valid speech token 6560 forever due to mixed 0/1-indexing, and cpp/turbo-loop penalized no BOS at all. - Runaway guard in all variants: the same token sampled 3x in a row stops generation with eos_found = FALSE and a warning (Python's alignment analyzer forces EOS at 2x; English-only Python 0.1.4 ships no guard and can emit 40s of garbage - issue #1). - C++ decoder reads rms_eps from the llama config instead of hardcoding. 7 new sampler unit tests (sign-dependence, crossing token, min-p). * Conditioning and numeric parity with Python chatterbox-tts 0.1.4 Conditioning (reference audio path): - New windowed-sinc resampler (R/resample.R), a port of torchaudio's _get_sinc_resample_kernel/_apply_sinc_resample_kernel; replaces linear interpolation in resample_audio(), which aliased reference features. Validated vs torchaudio: max abs diff 2.2e-10. - New Kaldi fbank (R/kaldi_fbank.R), a port of torchaudio.compliance.kaldi.fbank (povey window, preemphasis 0.97, HTK mel scale, power spectrum, snip_edges). CAMPPlus now sees the features it was trained on instead of a librosa-style mel at half the log scale. Validated vs torchaudio: max abs diff 4.9e-09. - Reference truncation parity: S3Gen prompt capped at 10 s (DEC_COND_LEN) and tokenizer conditioning prompt at 6 s (ENC_COND_LEN) in create_voice_embedding(), as in tts.py prepare_conditionals. - embed_ref() reconciles mel/token prompt lengths (trim tokens to mel_len %/% 2) and the flow uses the actual prompt mel length for the conditioning region, fixing an off-by-one-frame boundary on refs that are not multiples of 40 ms. - Voice encoder partials now match embeds_from_wavs defaults: frame_step 77 (rate = 1.3), min_coverage 0.8 extra zero-padded partial, and leading/trailing silence trim (librosa.effects.trim, top_db = 20, ported as trim_silence()). Numeric/small parity: - CFG unconditional row zeroed BEFORE positional embeddings (t3.py keeps text positions in the uncond branch). - Prefill ends in two BOS frames like t3.py; removed the dead bos_emb block that hinted at the lost intent. - CFM transformer feed-forward uses exact GELU (diffusers default), not the tanh approximation. - CFM no longer draws a wasted torch_randn_like on the standard path (RNG state + allocation churn). - autocast now defaults to FALSE: the Python reference runs float32 everywhere (S3Gen fp16 is explicitly off upstream); fp16 is opt-in. - conds.pt dropped from downloads: it is a nested torch pickle R torch cannot read, and the R API requires a reference voice (~105 MB saved). Validation scripts: scripts/test_kaldi_resample.R + scripts/save_kaldi_resample_ref.py (container reference). * Low-severity parity fixes and divergence docs - drop_invalid_tokens() slices first-SOS..first-EOS before filtering (Python semantics); post-EOS garbage no longer survives token-cap runs. - make_pad_mask() broadcasts correctly for batch > 1. - chatterbox() falls back to CPU with a warning when CUDA/MPS is unavailable, like Python from_pretrained. - GPT-2 pre-tokenizer regex uses \p{L}/\p{N} instead of POSIX classes. - README documents deliberate divergences: no Perth watermark (with disclosure), no builtin default voice, reliability extras, backend token caps, unported modules. - New NEWS.md summarizing the fidelity review. - tests/tinytest.R redirects R_USER_{CACHE,DATA,CONFIG}_DIR during checks (CRAN home-filespace hygiene). * rformat + document * Raise runaway guard to 10x repeats; GPU validation scripts; Rd fix GPU validation showed healthy generations (silence, laughter) tripping the 3x repeated-token guard; 10 consecutive identical FSQ codes (400 ms) only occur in degenerate loops. All validation cases now emit EOS and track the container reference. Also fixes the t3_inference Rd orphaned by the sampler helper insertion (stale top_p default) and ignores working drafts. * Correct NEWS: runaway guard threshold is 10x, not 3x * Bump version to 0.1.0.1 * Ignore local task notes * Remove debug artifacts and stale port-era docs - outputs/ safetensors debug dumps (13.9 MB) untracked; the dir was already gitignored and every dump is regenerable from scripts/ - test_output.wav, validation_status.csv: January debugging scratch, superseded by CLAUDE.md's validation table - PseudoCode.md contained parameters CLAUDE.md explicitly corrected; JIT_TRACE_OPTIMIZATION.md superseded by vignettes/performance.md - fyi.md was empty - ignore files tidied to match * Prune one-off debugging scripts; remove empty ref_audio dir entry Deleted the January port-debugging one-offs (debug_*, trace_*, profile_*, explore_*, check_*; 80 files, all in git history). Kept the component validation suite (test_*), the container reference generators (save_*), comparison scripts, and the parity validation scripts. * Restore useDynLib dropped in PR #3; guard with a test tinyrox::document() does not support @useDynLib and silently drops the manual useDynLib(chatterbox, .registration = TRUE) line from NAMESPACE. PR #3's document() run did exactly that, so backend = 'cpp' has errored with 'cpp_t3_decode not available' ever since - unnoticed because no test exercised the C++ path. New test_cpp_backend.R fails loudly if the registration ever disappears again. * Add backend benchmark, profiling, and GC-tuning scripts Tooling behind the June 2026 performance investigation: per-backend ms/token benchmark, profvis+debrief profiles (which found 91% of pure-R wall time in R GC triggered by torch's allocator policy), and the GC-threshold tuning grid. Findings documented in the cornelius vault. * Guard sampler tests against missing lantern CI installs the torch package but not its libtorch backend; requireNamespace alone passes there and the first torch_tensor() call errors. Use torch_is_installed() like the other torch-dependent tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
tts_to_filetext mitigation strategies - mixed case Upper/lower texts #2 before CRAN submission