Optimize the normal form detection#123
Merged
GjjvdBurg merged 2 commits intoalan-turing-institute:masterfrom Mar 21, 2024
Merged
Optimize the normal form detection#123GjjvdBurg merged 2 commits intoalan-turing-institute:masterfrom
GjjvdBurg merged 2 commits intoalan-turing-institute:masterfrom
Conversation
The even_rows was always called with every_row_has_delim. This meant possibly two full scans of all the rows. By joining those two functions, we can save one of the scans. Also, the logic previously implemented by even_rows now exits early whenever possible (previous implementation used to scan the whole file no matter what).
Contributor
Author
|
Sorry for the failed build, I amended the formatting issues. |
GjjvdBurg
reviewed
Mar 18, 2024
|
|
||
| form_and_dialect: List[ | ||
| Tuple[int, Callable[[str, SimpleDialect], bool], SimpleDialect] | ||
| Tuple[int, Callable[[list[str], SimpleDialect], bool], SimpleDialect] |
Collaborator
There was a problem hiding this comment.
I think the build is failing because you need from typing import List here for Python 3.8 (also in a few places below).
Contributor
Author
There was a problem hiding this comment.
Oh, that is probably it, I am so used to the liwercase versions I did not think of it :) thank you for the patience, I will fix it as soon as I can
Contributor
Author
There was a problem hiding this comment.
I changed the types, hopefully it will pas now :)
Collaborator
|
Thanks for opening this PR @no23reason! Looks like there are just a few build failures to iron out, but other than that it looks good |
Avoid unnecessary splitting and joining of the rows. The current implementation would split the file into rows in each of the is_form_x separately. They all would do it the same way. So instead, we can split the file once and pass the lines to the is_form_x directly. It also allows us to avoid "re-joining" of the lines in is_form_5 when it calls the is_form_2. The test_normal_forms test inputs were changed accordingly: they are split by the `\n` and the trailing newlines were manually removed (the actual code will always strip the trailing newlines before calling the is_form_x functions).
Collaborator
|
Thanks again @no23reason! |
Contributor
Author
|
Thank you, especially for the patience with the mistakes I should have caught faster :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Aimed at avoiding as much full file scans as possible, this PR should bring improved performance of the normal form detection.
Steps taken (there are more details in the individual commits):
even_rowslogic so that it exits as soon as possible instead of going through the whole fileis_form_xfunctions