Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: test_generate_general_rule_16 #48

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions core/rule_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -839,17 +839,17 @@ def tables(pattern_json: str, rewrite_json: str) -> list:
rewriteSet = defaultdict(list)

for table in patternTables:
if type(table['value']) is str and type(table['name']) is str:
if type(table['value']) is str and type(table['name']) is str and table['name'] not in patternSet[table['value']]:
patternSet[table['value']].append(table['name'])

for table in rewriteTables:
if type(table['value']) is str and type(table['name']) is str:
rewriteSet[table['value']].append(table['name'])
if type(table['value']) is str and type(table['name']) is str and table['name'] not in rewriteSet[table['value']]:
rewriteSet[table['value']].append(table['name'])

superSet = []
for patternValue, patternNames in patternSet.items():
rewriteNames = rewriteSet.get(patternValue, [])
# special case:
# special case 1:
# if the patternTable ONLY has {'value': 'employee', 'name': 'employee'}
# and the rewriteTable ONLY has {'value': 'employee', 'name': 'e1'},
# we replace 'employee' with 'e1' as table alias
Expand All @@ -861,6 +861,15 @@ def tables(pattern_json: str, rewrite_json: str) -> list:
#
if len(patternNames) == 1 and len(rewriteNames) == 1 and patternNames[0] == patternValue:
patternNames = rewriteNames
# special case 2:
# if the patternTable ONLY has {'value': 'employee', 'name': 'employee'}
# and the rewriteTable has multiple alias to the same table, e.g., {'value': 'employee', 'name': 'e1'}, {'value': 'employee', 'name': 'e2'}
# we directly append 'e1' and 'e2' as table alias to the superSet as IN MOST CASES patternTable's name should be included in rewriteTable's alias name
# the purpose is for the next step when we replace tables with variables
#
elif len(patternNames) == 1 and len(rewriteNames) != 0 and patternNames[0] == patternValue:
superSet += [{'value': patternValue, 'name': name} for name in rewriteNames]
continue
else:
patternNames += [name for name in rewriteNames if name not in patternNames]
superSet += [{'value': patternValue, 'name': name} for name in patternNames]
Expand Down
2 changes: 1 addition & 1 deletion tests/test_rule_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -2162,7 +2162,7 @@ def test_generate_general_rule_16():

q0_rule, q1_rule = unify_variable_names(rule['pattern'], rule['rewrite'])
assert q0_rule== "SELECT <x1>, <x2>, <x3>, <x4>, <x5>, <x6> FROM <x7> WHERE <x2> IN (SELECT <x2> FROM <x7> WHERE <x6> = <x8> AND <x3> = <x9>) ORDER BY <x2>, <x3>"
assert q1_rule == "SELECT <x10>.<x1>, <x10>.<x2>, <x10>.<x3>, <x10>.<x4>, <x10>.<x5>, <x10>.<x6> FROM <x10> JOIN <x11> ON <x11>.<x2> = <x10>.<x2> WHERE <x11>.<x6> = <x8> AND <x11>.<x3> = <x9> ORDER BY <x10>.<x2>, <x10>.<x3>"
assert q1_rule == "SELECT <x7>.<x1>, <x7>.<x2>, <x7>.<x3>, <x7>.<x4>, <x7>.<x5>, <x7>.<x6> FROM <x7> JOIN <x10> ON <x10>.<x2> = <x7>.<x2> WHERE <x10>.<x6> = <x8> AND <x10>.<x3> = <x9> ORDER BY <x7>.<x2>, <x7>.<x3>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still have <x10>?

Copy link
Collaborator Author

@HazelYuAhiru HazelYuAhiru Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example query, we have FROM historicoestatusrequisicion hist1 JOIN historicoestatusrequisicion hist2. In our current logic, hist2 will be represented as <x10> to indicate that a JOIN is being performed on another table (even though hist1 and hist2 are the same table). As we discussed in July, it's still difficult for QueryBooster to determine if two tables with different alias are the same without refactoring the code. This is only a temporary fix before refactoring the code and the user still needs to manually replace <x10> to make the rule meaningful in a pairwise comparison 😢



def test_generate_general_rule_17():
Expand Down