Skip to content

Add turbo support and prepare for CRAN#3

Merged
TroyHernandez merged 3 commits into
mainfrom
turbo-support
Apr 7, 2026
Merged

Add turbo support and prepare for CRAN#3
TroyHernandez merged 3 commits into
mainfrom
turbo-support

Conversation

@TroyHernandez

Copy link
Copy Markdown
Contributor

Summary

  • Adds turbo model support (GPT-2 backbone, MeanFlow decoder)
  • Prepares package for CRAN submission: 0 errors, 0 warnings, 1 harmless NOTE
  • Drops serve_chatterbox() since gpu.ctl is not on CRAN
  • Adds tinytest smoke tests
  • Slims tarball from 1.7GB to 98K via aggressive .Rbuildignore
  • Bumps version to 0.1.0

Test plan

- GPT-2 backbone (24 layers, 350M params) as alternative to Llama (30 layers, 520M)
- MeanFlow single-step decoder (2 steps) instead of 10-step CFM
- GPT-2 BPE tokenizer (50,276 vocab) in pure R
- Turbo download infrastructure from ResembleAI/chatterbox-turbo
- Turbo T3 inference: no CFG, top_k + repetition_penalty sampling
- HTTP serve endpoint with turbo parameter
- Fix DESCRIPTION authorship for CRAN (Resemble AI as copyright holder)
- Bump version to 0.1.0
- Fix DESCRIPTION: real email, ORCID, URL/BugReports, proper Description
- Drop serve_chatterbox() and gpu.ctl Suggests dep (gpu.ctl not on CRAN)
- Add tinytest smoke tests for chatterbox(), create_mel_filterbank(),
  compute_mel_spectrogram()
- Add PACKAGE = "chatterbox" to .Call("cpp_t3_decode")
- Add simplermarkdown vignette engine header to performance.md
- Aggressive .Rbuildignore: tarball goes from 1.7GB to 98K
- Regenerate Rd files via tinyrox::document() (resolves all codoc and
  undocumented arg WARNINGs)

R CMD check: 0 errors, 0 warnings, 1 harmless NOTE.
@TroyHernandez TroyHernandez merged commit 7e923be into main Apr 7, 2026
4 checks passed
@TroyHernandez TroyHernandez deleted the turbo-support branch April 7, 2026 03:28
TroyHernandez added a commit that referenced this pull request Jun 11, 2026
* Add Python fidelity review punch list and lessons

Full review of the R port against chatterbox-tts 0.1.4 (container
reference), five subsystem comparisons. 32 findings, 6 high severity.
See tasks/python-review-2026-06-11.md.

* Text front-end parity with Python chatterbox-tts 0.1.4

- generate() now applies punc_norm() unconditionally before tokenizing,
  matching tts.py. The standard (non-turbo) path previously skipped it,
  losing whitespace collapse, first-letter capitalization, punctuation
  rewrites, and the trailing period (a strong EOS cue; relates to #1).
- punc_norm() trims leading whitespace like Python's ' '.join(split()).
- tokenize_text() gains an added-token extraction pass mirroring HF
  tokenizers: [SPACE], [laughter], [sigh], etc. now tokenize atomically
  instead of being spelled out letter by letter (relates to #5).
- Fixed BPE corruption when a sequence fully merges to one token
  (e.g. 'th' produced [40, UNK, dup] instead of [40]).
- Non-space whitespace is dropped but still separates words, so BPE
  can no longer merge across a removed tab/newline.
- load_tokenizer() now stops (not warns) on missing [START]/[STOP],
  matching the Python assert.

Token streams verified byte-identical against the chatterbox-tts 0.1.4
container on mixed sentences, added tokens, and whitespace edge cases.

* Sampling parity with Python chatterbox-tts 0.1.4

- Repetition penalty is now sign-dependent (HF semantics): positive
  logits divided, negative multiplied. The old divide-only code made
  repeated tokens with negative logits MORE likely, under-damping the
  degenerate loops behind issue #1. Fixed in pure-R, traced, and C++
  variants (turbo already had it right).
- Extracted .sample_speech_token(), shared by the pure-R and traced
  loops so their sampling semantics cannot drift apart again.
- top_p now defaults to 1.0 (disabled) everywhere, matching Python's
  generate(); previously 0.9 in generate() and 0.95 in t3_inference.
- generate() gains min_p and now actually forwards min_p and
  repetition_penalty to the standard path (they were silently ignored).
- Top-p keeps the threshold-crossing token (HF TopPLogitsWarper shifts
  the mask right) in the R and turbo samplers; C++ already did this.
- Repetition-penalty set now includes the real BOS row in all variants;
  r/traced previously penalized valid speech token 6560 forever due to
  mixed 0/1-indexing, and cpp/turbo-loop penalized no BOS at all.
- Runaway guard in all variants: the same token sampled 3x in a row
  stops generation with eos_found = FALSE and a warning (Python's
  alignment analyzer forces EOS at 2x; English-only Python 0.1.4 ships
  no guard and can emit 40s of garbage - issue #1).
- C++ decoder reads rms_eps from the llama config instead of hardcoding.

7 new sampler unit tests (sign-dependence, crossing token, min-p).

* Conditioning and numeric parity with Python chatterbox-tts 0.1.4

Conditioning (reference audio path):
- New windowed-sinc resampler (R/resample.R), a port of torchaudio's
  _get_sinc_resample_kernel/_apply_sinc_resample_kernel; replaces linear
  interpolation in resample_audio(), which aliased reference features.
  Validated vs torchaudio: max abs diff 2.2e-10.
- New Kaldi fbank (R/kaldi_fbank.R), a port of
  torchaudio.compliance.kaldi.fbank (povey window, preemphasis 0.97,
  HTK mel scale, power spectrum, snip_edges). CAMPPlus now sees the
  features it was trained on instead of a librosa-style mel at half the
  log scale. Validated vs torchaudio: max abs diff 4.9e-09.
- Reference truncation parity: S3Gen prompt capped at 10 s
  (DEC_COND_LEN) and tokenizer conditioning prompt at 6 s (ENC_COND_LEN)
  in create_voice_embedding(), as in tts.py prepare_conditionals.
- embed_ref() reconciles mel/token prompt lengths (trim tokens to
  mel_len %/% 2) and the flow uses the actual prompt mel length for the
  conditioning region, fixing an off-by-one-frame boundary on refs that
  are not multiples of 40 ms.
- Voice encoder partials now match embeds_from_wavs defaults:
  frame_step 77 (rate = 1.3), min_coverage 0.8 extra zero-padded
  partial, and leading/trailing silence trim (librosa.effects.trim,
  top_db = 20, ported as trim_silence()).

Numeric/small parity:
- CFG unconditional row zeroed BEFORE positional embeddings (t3.py
  keeps text positions in the uncond branch).
- Prefill ends in two BOS frames like t3.py; removed the dead bos_emb
  block that hinted at the lost intent.
- CFM transformer feed-forward uses exact GELU (diffusers default),
  not the tanh approximation.
- CFM no longer draws a wasted torch_randn_like on the standard path
  (RNG state + allocation churn).
- autocast now defaults to FALSE: the Python reference runs float32
  everywhere (S3Gen fp16 is explicitly off upstream); fp16 is opt-in.
- conds.pt dropped from downloads: it is a nested torch pickle R torch
  cannot read, and the R API requires a reference voice (~105 MB saved).

Validation scripts: scripts/test_kaldi_resample.R +
scripts/save_kaldi_resample_ref.py (container reference).

* Low-severity parity fixes and divergence docs

- drop_invalid_tokens() slices first-SOS..first-EOS before filtering
  (Python semantics); post-EOS garbage no longer survives token-cap runs.
- make_pad_mask() broadcasts correctly for batch > 1.
- chatterbox() falls back to CPU with a warning when CUDA/MPS is
  unavailable, like Python from_pretrained.
- GPT-2 pre-tokenizer regex uses \p{L}/\p{N} instead of POSIX classes.
- README documents deliberate divergences: no Perth watermark (with
  disclosure), no builtin default voice, reliability extras, backend
  token caps, unported modules.
- New NEWS.md summarizing the fidelity review.
- tests/tinytest.R redirects R_USER_{CACHE,DATA,CONFIG}_DIR during
  checks (CRAN home-filespace hygiene).

* rformat + document

* Raise runaway guard to 10x repeats; GPU validation scripts; Rd fix

GPU validation showed healthy generations (silence, laughter) tripping
the 3x repeated-token guard; 10 consecutive identical FSQ codes (400 ms)
only occur in degenerate loops. All validation cases now emit EOS and
track the container reference. Also fixes the t3_inference Rd orphaned
by the sampler helper insertion (stale top_p default) and ignores
working drafts.

* Correct NEWS: runaway guard threshold is 10x, not 3x

* Bump version to 0.1.0.1

* Ignore local task notes

* Remove debug artifacts and stale port-era docs

- outputs/ safetensors debug dumps (13.9 MB) untracked; the dir was
  already gitignored and every dump is regenerable from scripts/
- test_output.wav, validation_status.csv: January debugging scratch,
  superseded by CLAUDE.md's validation table
- PseudoCode.md contained parameters CLAUDE.md explicitly corrected;
  JIT_TRACE_OPTIMIZATION.md superseded by vignettes/performance.md
- fyi.md was empty
- ignore files tidied to match

* Prune one-off debugging scripts; remove empty ref_audio dir entry

Deleted the January port-debugging one-offs (debug_*, trace_*,
profile_*, explore_*, check_*; 80 files, all in git history). Kept the
component validation suite (test_*), the container reference
generators (save_*), comparison scripts, and the parity validation
scripts.

* Restore useDynLib dropped in PR #3; guard with a test

tinyrox::document() does not support @useDynLib and silently drops the
manual useDynLib(chatterbox, .registration = TRUE) line from NAMESPACE.
PR #3's document() run did exactly that, so backend = 'cpp' has errored
with 'cpp_t3_decode not available' ever since - unnoticed because no
test exercised the C++ path. New test_cpp_backend.R fails loudly if
the registration ever disappears again.

* Add backend benchmark, profiling, and GC-tuning scripts

Tooling behind the June 2026 performance investigation: per-backend
ms/token benchmark, profvis+debrief profiles (which found 91% of pure-R
wall time in R GC triggered by torch's allocator policy), and the
GC-threshold tuning grid. Findings documented in the cornelius vault.

* Guard sampler tests against missing lantern

CI installs the torch package but not its libtorch backend;
requireNamespace alone passes there and the first torch_tensor() call
errors. Use torch_is_installed() like the other torch-dependent tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant