-
Notifications
You must be signed in to change notification settings - Fork 688
Adding CER/WER metrics #3418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding CER/WER metrics #3418
Conversation
… cer.rs and mod.rs were changed in the process. All tests pass.
…ementation works on words now as tokens, rather than chars.
Codecov Report❌ Patch coverage is
❌ Your project check has failed because the head coverage (63.50%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3418 +/- ##
==========================================
+ Coverage 63.43% 63.50% +0.07%
==========================================
Files 981 983 +2
Lines 109705 109925 +220
==========================================
+ Hits 69589 69807 +218
- Misses 40116 40118 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing these additional metrics!
I only have a couple of comments regarding the implementation.
/edit: ignore the failed test on CUDA CI, totally unrelated.
self.state.update( | ||
value, | ||
batch_size, | ||
FormatOptions::new(self.name()).unit("%").precision(2), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the state might need to keep track of the errors and total characters (or words for WER)? Otherwise aggregation might be incorrect 🤔 this would require a new state type though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that sounds correct. However, the value here includes the errors relative to the total characters, since value=total_edit_distance/total_characters * 100, so why would we need to keep the total characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current batch that's accurate, but when aggregated for an epoch it might be incorrect since this is a numeric state (not all batches have the same composition). Probably out of scope for this PR so no worries 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late follow-up! Didn't see this was updated. Please explicitly re-request a review so I know when changes have been applied 🙂
LGTM, just minor formatting issues to fix.
self.state.update( | ||
value, | ||
batch_size, | ||
FormatOptions::new(self.name()).unit("%").precision(2), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current batch that's accurate, but when aggregated for an epoch it might be incorrect since this is a numeric state (not all batches have the same composition). Probably out of scope for this PR so no worries 👍
/// deletions, or substitutions) required to change one sequence into the other. This | ||
/// implementation is optimized for space, using only two rows of the dynamic programming table. | ||
/// | ||
pub fn edit_distance(reference: &[i32], prediction: &[i32]) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can mark it as pub(crate)
only
Pull Request Template
Checklist
cargo run-checks
command has been executed.Related Issues/PRs
The PR is about Issue #2649, and implements two new metrics for sequence evaluation in NLP and related fields. These are the character error rate (CER) and word error rate(WER) metrics. These are (the same) error metrics used on the character and word levels, respectively.
Changes
The cer.rs file provides an implementation of the Character Error Rate metric. CER measures (here we use Levenshtein using DP) the percentage of characters that are incorrect (insertions, deletions, substitutions) in the predicted sequence compared to the reference sequence.
The wer.rs file implements the Word Error Rate metric. WER is similar to CER but operates at the word level, measuring the percentage of words that are incorrect in the predicted sequence.
Testing
Testing was done using unit tests, where each function (for example test_wer_without_padding, test_wer_with_padding) is a unit test for the WerMetric implementation. Same for CER.