Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved performance of comma and AND/OR removal regular expressions. #317

Conversation

HidekiSugimoto189
Copy link
Contributor

@HidekiSugimoto189 HidekiSugimoto189 commented Nov 4, 2023

Revised regular expressions to remove commas and AND/OR to avoid unnecessary backtracking.

  • Comparison of time for 1,000,000 regular expression executions for the same SQL
testcase old new diff(old-new)
removeFirstCommaWhenSelectClause 00:00:02.844 00:00:02.661 0.183
removeFirstCommaWhenOrderByClause 00:00:01.347 00:00:01.402 -0.055
removeFirstCommaWhenGroupByClause 00:00:01.454 00:00:01.455 -0.001
removeFirstCommaWhenSetClause 00:00:03.042 00:00:02.296 0.746
removeFirstCommaWhenStartBracket 00:00:02.801 00:00:02.266 0.535
removeFirstAndKeyWordWhenWhereClause 00:00:04.214 00:00:01.981 2.233

@HidekiSugimoto189
Copy link
Contributor Author

Changing the SqlContextImpl implementation to a method reference has reduced coverage, but has no impact on the code.

@shout-star shout-star merged commit f265e6d into release/v0.x Nov 5, 2023
1 check passed
@shout-star shout-star deleted the feature/Improved_performance_of_comma_and_and_removal_regexp branch November 5, 2023 13:45
@@ -75,13 +75,12 @@ public boolean contains(final Object o) {
}

/** where句の直後にくるANDやORを除外するための正規表現 */
protected static final Pattern WHERE_CLAUSE_PATTERN = Pattern
.compile("(?i)(?<clause>(^|\\s+)(WHERE\\s+(--.*|/\\*[^(/\\*|\\*/)]+?\\*/\\s*)*\\s*))(AND\\s+|OR\\s+)");
protected static final Pattern WHERE_CLAUSE_PATTERN = Pattern.compile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[^(/\\*|\\*/)]

I think this is probably not what was intended. A character class accepts only one character. Also, grouping using parentheses is not possible.

This means it is equivalent to [^()|*/\\].

Copy link
Member

@ota-meshi ota-meshi Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the next + quantifier is non-greedy (?), we can probably replace it with [\\s\\S] (match all), but I haven't tried it so I don't know 😅

@HidekiSugimoto189 HidekiSugimoto189 added this to the v0.26.7 milestone Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants