Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPENNLP-1531: Add Portuguese abbreviation dictionary #581

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

kinow
Copy link
Member

@kinow kinow commented Dec 31, 2023

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

@kinow
Copy link
Member Author

kinow commented Dec 31, 2023

Using @mawiesne 's French pull request #580 as reference, while looking at the Spanish abbreviation code. The current commit will not compile. I had to stop (tis' new year!), but will finish it later, edit the commit, update the title, etc. 👍

@kinow kinow force-pushed the OPENNLP-1531-Add-Portuguese-abbreviation-dictionary branch from 6d6ac28 to afa8d96 Compare January 1, 2024 16:12
@kinow kinow requested review from mawiesne, jzonthemtn and rzo1 January 1, 2024 17:09
@kinow kinow self-assigned this Jan 1, 2024
@kinow kinow marked this pull request as ready for review January 1, 2024 17:09
@kinow kinow changed the title OPENNLP-1531: Add Portuguese abbreviation dictionary (WIP) OPENNLP-1531: Add Portuguese abbreviation dictionary Jan 1, 2024
@mawiesne
Copy link
Contributor

mawiesne commented Jan 1, 2024

Thx @kinow - could you rebase so that PT fits on top of FR changes?

@mawiesne
Copy link
Contributor

mawiesne commented Jan 1, 2024

Nice PT text sample! Code-wise everything looks fine. abb_PT.xml is our largest abb dict so far, wow 🚀!

@kinow kinow force-pushed the OPENNLP-1531-Add-Portuguese-abbreviation-dictionary branch from afa8d96 to b54eff5 Compare January 1, 2024 21:16
String[] sentSplit = sentence.replaceAll("'", " '").split(" ");
String[] sentSplit = sentence
.replaceAll("'", " '")
.replaceAll(",", " ,")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my examples failed returning a token as word,, so I added this extra replace in this test 👍

@kinow
Copy link
Member Author

kinow commented Jan 1, 2024

Thx @kinow - could you rebase so that PT fits on top of FR changes?

Done! Tests passed locally for me.

@kinow
Copy link
Member Author

kinow commented Jan 1, 2024

Nice PT text sample! Code-wise everything looks fine.

Easier after the code base has been cleaned up a few times, and especially having some recent examples like the Fr (and also have reviewed the previous PR's you added for other langs). Thanks!!!

abb_PT.xml is our largest abb dict so far, wow 🚀!

🎉 we have everything from the Brazilian Academia de Letras, plus n.° that was missing, I believe 🎉

@kinow kinow merged commit 49dd0ed into main Jan 3, 2024
10 checks passed
@kinow kinow deleted the OPENNLP-1531-Add-Portuguese-abbreviation-dictionary branch January 3, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants