-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jp tn 20241017 #240
Jp tn 20241017 #240
Conversation
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci
nemo_text_processing/text_normalization/ja/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/ja/verbalizers/verbalize_final.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
…processing into jp_tn_20241017 Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci
nemo_text_processing/text_normalization/ja/data/numbers/digit_alt.tsv
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ja/data/time/minute.tsv
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,40 @@ | |||
Dr. ドクター | |||
dr. ドクター |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I may handle capitalization as part of the grammar. Again, doing so helps to reduce the size of the graph.
SINGULAR_TO_PLURAL = graph_plural | ||
PLURAL_TO_SINGULAR = pynini.invert(graph_plural) | ||
TO_LOWER = pynini.union(*[pynini.cross(x, y) for x, y in zip(string.ascii_uppercase, string.ascii_lowercase)]) | ||
TO_UPPER = pynini.invert(TO_LOWER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use this to capitalize char
s directly in the verbalizer/tagger grammars.
nemo_text_processing/text_normalization/ja/verbalizers/verbalize.py
Outdated
Show resolved
Hide resolved
tests/nemo_text_processing/ja/test_sparrowhawk_normalization.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me.
Just a few general comments:
- After having both Spanish and French grammars consistently blow up my memory, I try to make the graphs as small as possible. A number of my suggestions points to places where the graph can be made smaller.
- If you have time to make the above changes -- excellent. Otherwise, if the checks are passing, we can leave them as is.
- Are there certain classes that this TN system won't be supporting? If so, it would be helpful to mention that in the PR description and/or add it to the description of the tagger and verbalizer.
- Any line that's commented out that itself is not a comment, is generally understood as and optional component that can be included if needed. If that's not the case, it's safe to remove it.
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
…processing into jp_tn_20241017 Signed-off-by: Alex Cui <[email protected]>
…processing into jp_tn_20241017 Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Alex Cui <[email protected]>
…processing into jp_tn_20241017
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Alex Cui <[email protected]>
Signed-off-by: Alex Cui <[email protected]>
…processing into jp_tn_20241017 Signed-off-by: Alex Cui <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Alex Cui <[email protected]>
…processing into jp_tn_20241017 Signed-off-by: Alex Cui <[email protected]>
# def load_labels(abs_path): | ||
# """ | ||
# loads relative path file as dictionary | ||
|
||
# Args: | ||
# abs_path: absolute path | ||
|
||
# Returns dictionary of mappings | ||
# """ | ||
# #label_tsv = open(abs_path, encoding="utf-8") | ||
# label_tsv = open(abs_path, "r") | ||
# labels = list(csv.reader(label_tsv, delimiter="\t")) | ||
# return labels |
Check notice
Code scanning / CodeQL
Commented-out code Note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good here. Let's wait until CI checks pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ready to merge.
* ja tn Signed-off-by: Alex Cui <[email protected]> * adding ja Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * updated tests Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addresing comment Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * adresing comment Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment; Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date for ja Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * jenkins Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * typo Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adressing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci Signed-off-by: Alex Cui <[email protected]> --------- Signed-off-by: Alex Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* ja tn Signed-off-by: Alex Cui <[email protected]> * adding ja Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * updated tests Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addresing comment Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * adresing comment Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment; Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date for ja Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * jenkins Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * typo Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adressing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci Signed-off-by: Alex Cui <[email protected]> --------- Signed-off-by: Alex Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* ja tn Signed-off-by: Alex Cui <[email protected]> * adding ja Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * updated tests Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addresing comment Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * adresing comment Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment; Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date for ja Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * jenkins Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * typo Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adressing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci Signed-off-by: Alex Cui <[email protected]> --------- Signed-off-by: Alex Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* ja tn Signed-off-by: Alex Cui <[email protected]> * adding ja Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * updated tests Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * addressing ci Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addresing comment Signed-off-by: Alex Cui <[email protected]> * removing Signed-off-by: Alex Cui <[email protected]> * adresing comment Signed-off-by: Alex Cui <[email protected]> * removing unused import Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing comment; Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date for ja Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * jenkins Signed-off-by: Alex Cui <[email protected]> * addresing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * typo Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adressing comment Signed-off-by: Alex Cui <[email protected]> * addressing comment Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci Signed-off-by: Alex Cui <[email protected]> --------- Signed-off-by: Alex Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
Japanese TN grammar and test file
Before your PR is "Ready for review"
Pre checks:
git commit -s
to sign.pytest
or (if your machine does not have GPU)pytest --cpu
from the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')
).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
pytest
and Sparrowhawk here.__init__.py
for every folder and subfolder, includingdata
folder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
to all newly added Python files?Copyright 2015 and onwards Google, Inc.
. See an example here.try import: ... except: ...
) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.