Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profanity filtering for ITN - EN #86

Closed
wants to merge 20 commits into from

Commits on Jul 3, 2023

  1. Zh itn (NVIDIA#74)

    * Add ZH ITN
    
    Signed-off-by: Anand Joseph <[email protected]>
    
    * Fix copyrights and code cleanup
    
    Signed-off-by: Anand Joseph <[email protected]>
    
    * Remove invalid tests
    
    Signed-off-by: Anand Joseph <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Resolve CodeQL issues
    
    Signed-off-by: Anand Joseph <[email protected]>
    
    * Cleanup
    
    Signed-off-by: Anand Joseph <[email protected]>
    
    * Fix missing 'zh' option for ITN and correct comment
    
    Signed-off-by: Anand Joseph <[email protected]>
    
    * Update __init__.py
    
    Change to zh instead of en for the imports.
    
    Signed-off-by: Buyuan(Alex) Cui <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * update for decimal test data
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * update for langauge import
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * update for Chinese punctuations
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * a new class for whitelist
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * PYNINI_AVAILABLE = False
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * recreated due to file import format issue
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * recreated due to format issue
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * caught duplicates, removed
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed duplicates, arranges for CHInese Yuan updates
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates accordingly to the comments from last PR. Recreated some of the files due to format issues
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * re-added this file to avoid data file import error
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated gramamr according to last PR. Removed the acceptance of 千
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated according to last PR. Removed comma after decimal points
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * gramamr for Fraction
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * gramamr for money and updated according to last PR. Plus process of 元
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * ordinal grammar. updates due to the updates in cardinal grammar
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * arrangements
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * added whitelist grammar
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * word grammar for non-classified items
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated cardinal, decimal, time, itn data
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates according to last PR
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates according to the updates for cardinal grammar
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates for more Mandarin punctuations
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated accordingly to last PR. removing am pm
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * adjustment on the weight
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated accordingly to the targger updates
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated accordingly to the time tagger
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates according to changes in tagger on am and pm
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * verbalizer for fraction
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * added for mandarin grammar
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * kept this file because using English utils results in data namin error
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * merge conflict
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed unsed imports
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * deleted unsed import os
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * deleted unsed variables
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed unsed imports
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * updates and edits based on pr checks
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates and edits based on pr checks
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * format issue, reccreated
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * format issue recreated
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fixed codeing style/format
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * fixed coding style and format
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed duplicated graph for 毛
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed the comment
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed the comment
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removing unnecessary comments
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * unnecessary comment removed
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * test file updated for more cases
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * updated with a comment explaining why this file is kept
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updated the file explaining why this file is kept
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * added Mandarin as zh
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removing for dplication
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * removed unused NEMO objects
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed duplicates
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removing unsed imports
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to fix test file failures
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to fix file failtures
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to resolve test case failture
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to resolve test case failure
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to resolve test case failure
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to resolve test case failure
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to adap to cardinal grammar changes
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to adapt to grammar changes
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * updates to adopt to cardinal grammar changes
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix style
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * fix style
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * fix style
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * fix style
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * fixing pr checks
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * removed // for zhtn/itn cache
    
    Signed-off-by: BuyuanCui <[email protected]>
    
    * Update inverse_normalize.py
    
    Added zh as a selection to pass Jenkins checks.
    
    Signed-off-by: Buyuan(Alex) Cui <[email protected]>
    
    ---------
    
    Signed-off-by: Anand Joseph <[email protected]>
    Signed-off-by: Buyuan(Alex) Cui <[email protected]>
    Signed-off-by: BuyuanCui <[email protected]>
    Co-authored-by: Alex Cui <[email protected]>
    Co-authored-by: Anand Joseph <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Signed-off-by: gayu-thri <[email protected]>
    4 people authored and gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    a8078de View commit details
    Browse the repository at this point in the history
  2. Add profanity filtering for english ITN

    Signed-off-by: Gayathri Ethiraj <[email protected]>
    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    29a6272 View commit details
    Browse the repository at this point in the history
  3. Add copyrights

    Signed-off-by: Gayathri Ethiraj <[email protected]>
    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    d65ff7d View commit details
    Browse the repository at this point in the history
  4. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    
    Signed-off-by: gayu-thri <[email protected]>
    pre-commit-ci[bot] authored and gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    d70a4ec View commit details
    Browse the repository at this point in the history
  5. Add filter_profanity attr to InverseNormalizer

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    252bb6d View commit details
    Browse the repository at this point in the history
  6. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    
    Signed-off-by: gayu-thri <[email protected]>
    pre-commit-ci[bot] authored and gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    2e71ebb View commit details
    Browse the repository at this point in the history
  7. Different fst names with/without pf

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    a8a7826 View commit details
    Browse the repository at this point in the history
  8. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    
    Signed-off-by: gayu-thri <[email protected]>
    pre-commit-ci[bot] authored and gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    0cfb3a8 View commit details
    Browse the repository at this point in the history
  9. Rm written form in TSV and use fst operations to get it

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    62efdd6 View commit details
    Browse the repository at this point in the history
  10. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    
    Signed-off-by: gayu-thri <[email protected]>
    pre-commit-ci[bot] authored and gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    1d5a362 View commit details
    Browse the repository at this point in the history
  11. user configurable input file for profane words

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    f9e5bde View commit details
    Browse the repository at this point in the history
  12. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    
    Signed-off-by: gayu-thri <[email protected]>
    pre-commit-ci[bot] authored and gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    b0a6a98 View commit details
    Browse the repository at this point in the history
  13. Merge branch 'main' into add-profanity-filtering

    Signed-off-by: Gayathri Ethiraj <[email protected]>
    gayu-thri committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    b3375c2 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2023

  1. Fix error in CodeQL

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Jul 25, 2023
    Configuration menu
    Copy the full SHA
    e6548dd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a46ea2d View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2023

  1. Resolve PR comments

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Aug 7, 2023
    Configuration menu
    Copy the full SHA
    bf4e9a1 View commit details
    Browse the repository at this point in the history
  2. disable filtering profanity by default

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Aug 7, 2023
    Configuration menu
    Copy the full SHA
    3c79f42 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1ce954c View commit details
    Browse the repository at this point in the history
  4. Set filer_profanity to True in profane test

    Signed-off-by: gayu-thri <[email protected]>
    gayu-thri committed Aug 7, 2023
    Configuration menu
    Copy the full SHA
    9c710c4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9880c05 View commit details
    Browse the repository at this point in the history