Add LODR support to online and offline recognizers#2026
Add LODR support to online and offline recognizers#2026csukuangfj merged 30 commits intok2-fsa:masterfrom
Conversation
|
Can you show how it improves the decoding result and also how it affects the RTF? |
|
You can check the LODR paper In our experiments with private data we saw relative improvements of 3-7%. Some performance numbers as reported by shepra-onnx (non-optimized debug build on CPU) LM rescore, no LODR: LM shallow fusion, no LODR: |
Can you test with a release build? |
rescore: rescore+LODR: SF: SF+LODR: |
|
@csukuangfj Just wanted to kindly check in to see if there's anything else you'd like me to update on this PR. btw, appreciate your time and all the work you do on the project. Is there any plan to have more maintainers/reviewers? |
Thank you for sharing the test results.
Yes, sherpa-onnx is an open-source project. Contributions of any form, e.g., pull-requests, code reviews, are always welcome. |
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
|
Hi again! Requested changes have been integrated into PR. |
vsd-vector
left a comment
There was a problem hiding this comment.
Is
backoff_id_always 0?I think it is the id of
#0, right?
Yes, usually it's the id of #0, so one of last tokens in the vocabulary. So probably, 0 is not a good default.
I have two ideas:
- I could set default value to -1 and later somehow automatically deduce it from tokens.txt
- or make backoff_id a required parameter if lodr_fst is set
|
@csukuangfj |
|
Can you add a CI test for it? It would be great if a python example test and a example using pre-built binary are available so that users can learn how to use the new feature through examples. |
Yes, I think I can add something like this. I will need to download models and LODR FST during the CI test, I can probably use some public models, but what about FST ? Also, what audio should I use in the test and where is best place to host it? |
|
Can you upload the files to huggingface and download them from CI? |
|
By the way, if you don't want to make your model and fst public, can you use the test model and fst files from icefall? |
|
@csukuangfj I added some CI tests using Zipformer2 EN models, CLI and python |
Thanks! Will review it this week. |
|
@csukuangfj is there anything you'd like me to update on this PR? |
csukuangfj
left a comment
There was a problem hiding this comment.
Thanks! Left some minor comments. Otherwise, it looks good to me.
There was a problem hiding this comment.
Pull Request Overview
This PR integrates LODR (Level-Ordered Deterministic Rescoring) support from Icefall into both online and offline recognizers, enabling LODR for LM shallow fusion and LM rescore.
- Extended
OnlineLMConfigandOfflineLMConfigto includelodr_fst,lodr_scale, andlodr_backoff_id. - Implemented
LodrFstandLodrStateCostclasses and wired them into RNN LM scoring in both online and offline code paths. - Updated Python bindings, CLI entry points, examples, and CI test scripts to accept and exercise the new LODR options.
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| python/sherpa_onnx/online_recognizer.py | Added lodr_fst and lodr_scale parameters to factory method |
| python/sherpa_onnx/offline_recognizer.py | Same additions for offline recognizer factory |
| python/csrc/online-lm-config.cc | Extended pybind init signature and read/write fields |
| python/csrc/offline-lm-config.cc | Extended pybind init signature and read/write fields |
| csrc/online-lm-config.h/.cc | Added LODR members, Register, Validate, ToString |
| csrc/offline-lm-config.h/.cc | Same for offline LM config |
| csrc/lodr-fst.h / csrc/lodr-fst.cc | New LODR FST implementation |
| csrc/online-rnn-lm.cc / csrc/offline-rnn-lm.cc | Integrated LODR into RNN LM scoring |
| csrc/offline-lm.h/.cc | Integrated LODR into generic offline LM |
| python-api-examples/online-decode-files.py | Added LODR options to demo script |
| python-api-examples/offline-decode-files.py | Same for offline example |
| .github/scripts/*.sh | Download and test LODR FST in CI |
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
WalkthroughThis change introduces LODR (Lattice On-Demand Rescoring) support across both offline and online speech recognition pipelines. It adds new configuration options, command-line arguments, and implementation for LODR FST-based rescoring in C++ and Python APIs. Test scripts and example usage are updated to validate and demonstrate the new functionality, and supporting classes for FST-based rescoring are implemented. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PythonScript
participant Recognizer
participant LM (RNN/NN)
participant LODR FST
User->>PythonScript: Run decode with --lm, --lodr-fst, --lodr-scale
PythonScript->>Recognizer: Construct with LM and LODR config
Recognizer->>LM (RNN/NN): Score hypothesis
Recognizer->>LODR FST: Rescore hypothesis with FST and scale
LODR FST-->>Recognizer: Return LODR score
LM (RNN/NN)-->>Recognizer: Return LM score
Recognizer-->>PythonScript: Final rescored hypothesis
PythonScript-->>User: Output results
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 8
♻️ Duplicate comments (3)
sherpa-onnx/csrc/online-lm-config.h (2)
21-23: Use consistent integer type forlodr_backoff_id.The member
lodr_backoff_idusesintwhile the codebase convention is to useint32_tfor consistency with other similar members in the struct.- int lodr_backoff_id = -1; + int32_t lodr_backoff_id = -1;
29-32: Update constructor parameter type for consistency.The constructor parameter should use
int32_tto match the member variable type.- int lodr_backoff_id) + int32_t lodr_backoff_id)sherpa-onnx/csrc/lodr-fst.h (1)
52-52: Add comment documenting fst_ ownership.Please add a comment clarifying whether
fst_is owned by this class, similar to the documentation provided forfst_in theLodrStateCostclass.
🧹 Nitpick comments (3)
sherpa-onnx/python/sherpa_onnx/offline_recognizer.py (1)
72-73: Add documentation for the new LODR parameters.The new
lodr_fstandlodr_scaleparameters are not documented in the method's docstring.Add documentation for these parameters in the docstring around line 138:
rule_fars: If not empty, it specifies fst archives for inverse text normalization. If there are multiple archives, they are separated by a comma. + lodr_fst: + Path to the LODR (Lookahead On-the-fly Determinization and Rescoring) + FST file in binary format. If empty, LODR is disabled. + lodr_scale: + Scale factor for LODR rescoring. Only used when lodr_fst is provided.sherpa-onnx/python/sherpa_onnx/online_recognizer.py (1)
92-93: Add documentation for the new LODR parameters.The new
lodr_fstandlodr_scaleparameters are not documented in the method's docstring.Add documentation for these parameters in the docstring around line 220:
trt_dump_subgraphs: bool = False, "Dump optimized subgraphs for debugging." TensorRT EP + lodr_fst: + Path to the LODR (Lookahead On-the-fly Determinization and Rescoring) + FST file in binary format. If empty, LODR is disabled. + lodr_scale: + Scale factor for LODR rescoring. Only used when lodr_fst is provided.sherpa-onnx/csrc/lodr-fst.cc (1)
119-124: Optimize memory allocation in the loop.Creating a new
unique_ptrin each iteration is inefficient. Consider modifying the existing object in-place instead.- for (size_t i = offset; i < hyp->ys.size(); ++i) { - auto next_lodr_state = std::make_unique<LodrStateCost>( - hyp->lodr_state->ForwardOneStep(hyp->ys[i])); - - hyp->lodr_state = std::move(next_lodr_state); - } + for (size_t i = offset; i < hyp->ys.size(); ++i) { + *hyp->lodr_state = hyp->lodr_state->ForwardOneStep(hyp->ys[i]); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.github/scripts/test-offline-transducer.sh(1 hunks).github/scripts/test-online-transducer.sh(1 hunks).github/scripts/test-python.sh(1 hunks)python-api-examples/offline-decode-files.py(3 hunks)python-api-examples/online-decode-files.py(3 hunks)sherpa-onnx/csrc/CMakeLists.txt(1 hunks)sherpa-onnx/csrc/hypothesis.h(2 hunks)sherpa-onnx/csrc/lodr-fst.cc(1 hunks)sherpa-onnx/csrc/lodr-fst.h(1 hunks)sherpa-onnx/csrc/offline-lm-config.cc(3 hunks)sherpa-onnx/csrc/offline-lm-config.h(1 hunks)sherpa-onnx/csrc/offline-lm.cc(2 hunks)sherpa-onnx/csrc/offline-lm.h(2 hunks)sherpa-onnx/csrc/offline-rnn-lm.cc(1 hunks)sherpa-onnx/csrc/online-lm-config.cc(3 hunks)sherpa-onnx/csrc/online-lm-config.h(1 hunks)sherpa-onnx/csrc/online-rnn-lm.cc(5 hunks)sherpa-onnx/python/csrc/offline-lm-config.cc(1 hunks)sherpa-onnx/python/csrc/online-lm-config.cc(1 hunks)sherpa-onnx/python/sherpa_onnx/offline_recognizer.py(2 hunks)sherpa-onnx/python/sherpa_onnx/online_recognizer.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
sherpa-onnx/csrc/offline-lm.cc (3)
sherpa-onnx/csrc/lodr-fst.h (1)
scale(37-37)sherpa-onnx/csrc/online-rnn-lm.cc (4)
scale(34-69)scale(34-34)scale(72-121)scale(72-73)sherpa-onnx/csrc/offline-lm.h (1)
scale(50-51)
sherpa-onnx/csrc/offline-rnn-lm.cc (2)
sherpa-onnx/csrc/offline-lm.h (1)
config(27-27)sherpa-onnx/csrc/offline-rnn-lm.h (3)
OfflineRnnLM(18-18)OfflineRnnLM(20-20)OfflineRnnLM(23-23)
.github/scripts/test-online-transducer.sh (1)
.github/scripts/test-online-ctc.sh (1)
log(5-9)
.github/scripts/test-python.sh (1)
.github/scripts/test-online-ctc.sh (1)
log(5-9)
sherpa-onnx/python/csrc/online-lm-config.cc (1)
sherpa-onnx/csrc/online-rnn-lm.cc (4)
scale(34-69)scale(34-34)scale(72-121)scale(72-73)
sherpa-onnx/python/csrc/offline-lm-config.cc (1)
sherpa-onnx/csrc/online-rnn-lm.cc (4)
scale(34-69)scale(34-34)scale(72-121)scale(72-73)
sherpa-onnx/csrc/offline-lm-config.cc (4)
sherpa-onnx/csrc/offline-lm-config.h (1)
po(38-38)sherpa-onnx/csrc/file-utils.cc (2)
FileExists(16-18)FileExists(16-16)sherpa-onnx/csrc/lodr-fst.h (1)
scale(37-37)sherpa-onnx/csrc/online-rnn-lm.cc (4)
scale(34-69)scale(34-34)scale(72-121)scale(72-73)
🪛 Shellcheck (0.10.0)
.github/scripts/test-online-transducer.sh
[warning] 196-196: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 197-197: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 198-198: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[error] 201-201: Double quote array expansions to avoid re-splitting elements.
(SC2068)
[error] 215-215: Double quote array expansions to avoid re-splitting elements.
(SC2068)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (50)
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: windows-2022 3.10
- GitHub Check: windows-2022 3.8
- GitHub Check: windows-2022 3.11
- GitHub Check: rknn shared ON
🔇 Additional comments (39)
sherpa-onnx/csrc/CMakeLists.txt (1)
28-28: LGTM! Clean addition of LODR FST source file.The new
lodr-fst.ccsource file is correctly added to the build system in alphabetical order.sherpa-onnx/python/sherpa_onnx/online_recognizer.py (1)
303-304: Ensure consistent default scale values.The default
lodr_scale=0.1should be consistent with the C++ implementation and the offline recognizer.This is the same potential consistency issue as in the offline recognizer - please verify the default values match across all implementations.
sherpa-onnx/csrc/offline-lm.h (1)
13-13: LGTM! Correct include for LODR FST functionality.The include for
lodr-fst.his properly added to support the new LODR functionality..github/scripts/test-python.sh (1)
565-597: LGTM! Well-structured LODR test integration.The new test section follows the established pattern in the file and properly exercises the LODR functionality. The use of Git LFS for downloading large model files is appropriate, and the cleanup is thorough.
sherpa-onnx/csrc/offline-lm-config.h (2)
22-24: LGTM! LODR configuration members properly added.The new LODR members (
lodr_fstandlodr_scale) are correctly defined with appropriate default values and follow the existing code patterns.
28-36: LGTM! Constructor properly updated for LODR parameters.The constructor signature and initialization list are correctly updated to include the new LODR parameters. The initialization order matches the member definition order.
sherpa-onnx/csrc/offline-rnn-lm.cc (2)
85-86: LGTM! Proper base class initialization.The addition of
OfflineLM(config)to the member initializer list ensures the base class is properly initialized with the configuration that now includes LODR parameters.
88-90: LGTM! Template constructor properly updated.The template constructor also correctly calls the base class constructor with the configuration parameter.
sherpa-onnx/csrc/online-lm-config.h (1)
34-40: LGTM! Constructor initialization properly structured.The constructor initialization list correctly initializes all members in the proper order matching the member definition order.
sherpa-onnx/csrc/hypothesis.h (3)
15-15: LGTM! Appropriate header inclusion.The
<memory>header is correctly added to support the newstd::shared_ptrmember.
19-19: LGTM! Necessary header inclusion for LODR support.The inclusion of
lodr-fst.his required for theLodrStateCosttype used in the new member.
66-67: LGTM! LODR state member properly added.The new
lodr_statemember is correctly defined as astd::shared_ptr<LodrStateCost>and properly default-initialized without explicit nullptr assignment, following the established pattern mentioned in past reviews.sherpa-onnx/python/csrc/offline-lm-config.cc (2)
16-20: LGTM - Constructor signature correctly extended for LODR support.The new
lodr_fstandlodr_scaleparameters are properly added to the constructor with appropriate default values.
25-26: LGTM - LODR parameters properly exposed as read/write attributes.The new attributes are correctly exposed to Python with appropriate access patterns.
sherpa-onnx/csrc/offline-lm.cc (2)
20-20: LGTM - Proper header inclusion for LODR functionality.The
lodr-fst.hheader is correctly included to enable LODR FST operations.
78-89: LGTM - LODR integration follows established patterns.The implementation correctly:
- Scales LODR by multiplying with LM scale to replicate Icefall behavior
- Uses conditional check to prevent crashes when LODR is disabled
- Calls
ComputeScorewith appropriate parameters matching the pattern inonline-rnn-lm.ccsherpa-onnx/csrc/offline-lm-config.cc (3)
21-22: LGTM - LODR options properly registered.The new command-line options are correctly registered with appropriate descriptions.
31-34: LGTM - File existence validation addresses previous feedback.The validation correctly checks that the LODR FST file exists when provided, addressing the past review comment requesting this validation.
44-46: LGTM - ToString() method updated consistently.The string representation properly includes the new LODR parameters in a consistent format.
sherpa-onnx/python/csrc/online-lm-config.cc (2)
16-22: LGTM - Constructor signature correctly extended for LODR support.The new parameters are properly added to the constructor with appropriate defaults.
28-30: LGTM - LODR parameters properly exposed as read/write attributes.The new attributes are correctly exposed to Python with appropriate access patterns.
.github/scripts/test-offline-transducer.sh (3)
284-298: LGTM - Proper model downloading with Git LFS.The implementation correctly:
- Uses Git LFS to handle large model files efficiently
- Selectively pulls only needed files to save bandwidth
- Downloads both RNN LM and bigram FST models from appropriate repositories
302-314: LGTM - Comprehensive LODR testing with proper parameters.The test execution correctly:
- Uses
modified_beam_searchdecoding method appropriate for LM rescoring- Includes all necessary LODR parameters (
--lm,--lodr-fst,--lodr-scale)- Tests with multiple audio files to ensure robustness
- Uses a negative scale value (-0.5) which is typical for LODR rescoring
316-316: LGTM - Proper cleanup of downloaded resources.The cleanup correctly removes all downloaded repositories to prevent CI storage issues.
sherpa-onnx/csrc/online-lm-config.cc (3)
23-27: LGTM: LODR configuration options properly registered.The new LODR configuration options are correctly integrated into the existing configuration system with appropriate parameter names and descriptions.
35-39: LGTM: File existence validation for LODR FST.The validation logic properly checks if the LODR FST file exists when specified, following the same pattern as the existing LM model validation.
49-52: LGTM: ToString() method updated correctly.The new LODR fields are properly formatted in the string representation. The use of double quotes for the FST path (string) and no quotes for numeric values is consistent with existing code style.
.github/scripts/test-online-transducer.sh (1)
177-192: LGTM: Good test coverage for LODR functionality.The addition of LODR test coverage is valuable, testing both RNN LM and bigram FST integration. The use of Git LFS for selective model file download is appropriate.
python-api-examples/online-decode-files.py (3)
24-39: LGTM: Clear documentation example for LODR usage.The new usage example effectively demonstrates how to use LODR with RNN LM rescoring, providing users with a concrete example of the command-line parameters.
205-219: LGTM: LODR arguments properly defined.The new command-line arguments are well-documented with appropriate help text and default values. The constraint that LODR FST is only used when LM is given is clearly stated.
355-356: LGTM: LODR parameters correctly passed to recognizer.The new LODR parameters are properly integrated into the transducer recognizer creation, following the established pattern for other optional parameters.
sherpa-onnx/csrc/online-rnn-lm.cc (5)
15-15: LGTM: Appropriate header inclusion.The inclusion of
lodr-fst.hheader is necessary for the LODR functionality integration.
39-58: LGTM: Well-structured LODR integration in shallow fusion.The LODR state initialization and score calculation in shallow fusion is well-implemented:
- Proper conditional checks for LODR availability
- Correct state management with unique_ptr
- Appropriate score scaling and application
- Clear separation of concerns
108-112: LGTM: Consistent LODR integration in rescoring.The LODR score application in the rescoring method correctly:
- Uses conditional checks for LODR availability
- Applies proper scaling (LODR scale * LM scale)
- Maintains consistency with the Icefall implementation
180-184: LGTM: Proper LODR FST initialization.The LODR FST is correctly initialized only when the configuration specifies a non-empty FST path, using appropriate constructor parameters.
234-234: LGTM: Clean member variable addition.The LODR FST member variable is appropriately declared as a unique_ptr, following modern C++ practices.
python-api-examples/offline-decode-files.py (3)
38-56: LGTM: Comprehensive LODR usage example.The new documentation example clearly demonstrates how to use LODR with RNN LM rescoring in offline decoding, providing users with practical guidance.
292-322: LGTM: Consistent LODR argument definitions.The LODR command-line arguments are properly defined with appropriate help text and default values, maintaining consistency with the online version.
419-422: LGTM: Proper LODR parameter integration.The LODR parameters are correctly passed to the offline transducer recognizer, following the established pattern for optional parameters.
csukuangfj
left a comment
There was a problem hiding this comment.
Thank you for your contribution!
This PR adds LODR support from Icefall to offline and online recognizers for both LM shallow fusion and LM rescore.
(see https://k2-fsa.github.io/icefall/decoding-with-langugage-models/LODR.html)
Usage example:
Where,
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores