Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jp tn 20241017 #240

Merged
merged 36 commits into from
Oct 18, 2024
Merged

Jp tn 20241017 #240

merged 36 commits into from
Oct 18, 2024

Conversation

BuyuanCui
Copy link
Collaborator

@BuyuanCui BuyuanCui commented Oct 17, 2024

What does this PR do ?

Japanese TN grammar and test file

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

BuyuanCui and others added 5 commits October 17, 2024 08:30
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
@zoobereq zoobereq self-requested a review October 17, 2024 18:21
@@ -0,0 +1,40 @@
Dr. ドクター
dr. ドクター
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I may handle capitalization as part of the grammar. Again, doing so helps to reduce the size of the graph.

SINGULAR_TO_PLURAL = graph_plural
PLURAL_TO_SINGULAR = pynini.invert(graph_plural)
TO_LOWER = pynini.union(*[pynini.cross(x, y) for x, y in zip(string.ascii_uppercase, string.ascii_lowercase)])
TO_UPPER = pynini.invert(TO_LOWER)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use this to capitalize chars directly in the verbalizer/tagger grammars.

Copy link
Collaborator

@zoobereq zoobereq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

Just a few general comments:

  • After having both Spanish and French grammars consistently blow up my memory, I try to make the graphs as small as possible. A number of my suggestions points to places where the graph can be made smaller.
  • If you have time to make the above changes -- excellent. Otherwise, if the checks are passing, we can leave them as is.
  • Are there certain classes that this TN system won't be supporting? If so, it would be helpful to mention that in the PR description and/or add it to the description of the tagger and verbalizer.
  • Any line that's commented out that itself is not a comment, is generally understood as and optional component that can be included if needed. If that's not the case, it's safe to remove it.

Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Comment on lines +31 to +43
# def load_labels(abs_path):
# """
# loads relative path file as dictionary

# Args:
# abs_path: absolute path

# Returns dictionary of mappings
# """
# #label_tsv = open(abs_path, encoding="utf-8")
# label_tsv = open(abs_path, "r")
# labels = list(csv.reader(label_tsv, delimiter="\t"))
# return labels

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Copy link
Collaborator

@zoobereq zoobereq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good here. Let's wait until CI checks pass.

Copy link
Collaborator

@zoobereq zoobereq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready to merge.

@zoobereq zoobereq merged commit a3fc6f5 into main Oct 18, 2024
5 checks passed
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 24, 2024
* ja tn

Signed-off-by: Alex Cui <[email protected]>

* adding ja

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* updated tests

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* adresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment;

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date for ja

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* jenkins

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* typo

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci

Signed-off-by: Alex Cui <[email protected]>

---------

Signed-off-by: Alex Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* ja tn

Signed-off-by: Alex Cui <[email protected]>

* adding ja

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* updated tests

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* adresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment;

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date for ja

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* jenkins

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* typo

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci

Signed-off-by: Alex Cui <[email protected]>

---------

Signed-off-by: Alex Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* ja tn

Signed-off-by: Alex Cui <[email protected]>

* adding ja

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* updated tests

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* adresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment;

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date for ja

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* jenkins

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* typo

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci

Signed-off-by: Alex Cui <[email protected]>

---------

Signed-off-by: Alex Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* ja tn

Signed-off-by: Alex Cui <[email protected]>

* adding ja

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* updated tests

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* addressing ci

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing

Signed-off-by: Alex Cui <[email protected]>

* adresing comment

Signed-off-by: Alex Cui <[email protected]>

* removing unused import

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing comment;

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date for ja

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* jenkins

Signed-off-by: Alex Cui <[email protected]>

* addresing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* typo

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adressing comment

Signed-off-by: Alex Cui <[email protected]>

* addressing comment

Signed-off-by: Alex Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci

Signed-off-by: Alex Cui <[email protected]>

---------

Signed-off-by: Alex Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants