Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Duplicate Column Names #238

Conversation

GabeFernandez310
Copy link

@GabeFernandez310 GabeFernandez310 commented Mar 6, 2023

Description

PoC created to add support for duplicate column names. Previously an error was thrown.
See discussions here and here.

Examples:

opensearchsql> select 1 AS `a`, 2 AS `a`, 3, 3;
fetched rows / total rows = 1/1
+-----+-----+-----+-----+
| a   | a   | 3   | 3   |
|-----+-----+-----+-----|
| 1   | 2   | 3   | 3   |
+-----+-----+-----+-----+
opensearchsql> select 1+1, 1+1;
fetched rows / total rows = 1/1
+-------+-------+
| 1+1   | 1+1   |
|-------+-------|
| 2     | 2     |
+-------+-------+

Issues Resolved

opensearch-project#785
opensearch-project#1382

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: GabeFernandez310 <[email protected]>
@codecov
Copy link

codecov bot commented Mar 6, 2023

Codecov Report

Merging #238 (3b39de2) into integ-support-duplicate-column-names (0ccb20a) will decrease coverage by 0.06%.
The diff coverage is 61.53%.

@@                            Coverage Diff                             @@
##             integ-support-duplicate-column-names     #238      +/-   ##
==========================================================================
- Coverage                                   98.38%   98.33%   -0.06%     
- Complexity                                   3694     3695       +1     
==========================================================================
  Files                                         343      343              
  Lines                                        9113     9122       +9     
  Branches                                      586      587       +1     
==========================================================================
+ Hits                                         8966     8970       +4     
- Misses                                        142      146       +4     
- Partials                                        5        6       +1     
Flag Coverage Δ
sql-engine 98.33% <61.53%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ensearch/sql/planner/physical/ProjectOperator.java 86.84% <61.53%> (-13.16%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@GabeFernandez310
Copy link
Author

As it is currently, MySQL seems to return an error if you attempt to use GROUP BY with duplicate aliases. Our plugin will return results and group by the last column that was assigned the duplicate alias.

Examples:

MySQL:
You can test the following MySQL query on the w3schools instance of MySQL running 5.7.38

SELECT SUM(CustomerID), Country AS alias, City AS alias
FROM Customers
GROUP BY alias;

Returns:

Error in SQL:
Column 'alias' in group statement is ambiguous

Our plugin:
Tested using our CLI

opensearchsql> SELECT SUM(num1), str0 as a, str1 as a FROM calcs GROUP BY a;

Returns:

fetched rows / total rows = 17/17
+-------------+------+---------------------+
| SUM(num1)   | a    | a                   |
|-------------+------+---------------------|
| 9.78        | null | AIR PURIFIERS       |
| 9.47        | null | ANSWERING MACHINES  |
| 7.43        | null | BINDER ACCESSORIES  |
| 9.05        | null | BINDER CLIPS        |
| 9.38        | null | BINDING MACHINES    |
| 16.42       | null | BINDING SUPPLIES    |
| 12.4        | null | BUSINESS COPIERS    |
| 11.38       | null | BUSINESS ENVELOPES  |
| 10.32       | null | CD-R MEDIA          |
| 8.42        | null | CLAMP ON LAMPS      |
| 6.71        | null | CLOCKS              |
| 2.47        | null | CONFERENCE PHONES   |
| 12.05       | null | CORDED KEYBOARDS    |
| 10.37       | null | CORDLESS KEYBOARDS  |
| 7.1         | null | DOT MATRIX PRINTERS |
| 16.81       | null | DVD                 |
| 7.12        | null | ERICSSON            |
+-------------+------+---------------------+

@GabeFernandez310
Copy link
Author

Queries like the following will also now work with the V2 engine....

SELECT *, str0 FROM calcs

Returns

-[ RECORD 1 ]-------------------------
bool3     | True
int0      | 1
time1     | 1970-01-01 19:36:22
bool2     | False
int2      | 5
int1      | -3
str3      | e
int3      | 8
str1      | CLAMP ON LAMPS
str2      | one
time0     | 1899-12-30 21:07:32
datetime0 | 2004-07-09 10:17:35
num1      | 8.42
num0      | 12.3
datetime1 | null
num4      | null
bool1     | True
key       | key00
num3      | -11.52
bool0     | True
num2      | 17.86
str0      | FURNITURE
date3     | 1986-03-20 00:00:00
date2     | 1977-04-20 00:00:00
date1     | 2004-04-01 00:00:00
date0     | 2004-04-15 00:00:00
zzz       | a
str0      | FURNITURE
-[ RECORD 2 ]-------------------------
bool3     | null
int0      | null
time1     | 1970-01-01 02:05:25
bool2     | False
int2      | -4
int1      | -6
str3      | e
int3      | 13
str1      | CLOCKS
str2      | two
time0     | 1900-01-01 13:48:48
datetime0 | 2004-07-26 12:30:34
num1      | 6.71
num0      | -12.3
datetime1 | null
num4      | 10.85
bool1     | True
key       | key01
num3      | -9.31
bool0     | False
num2      | 16.73
str0      | FURNITURE
date3     | null
date2     | 1995-09-03 00:00:00
date1     | 2004-04-02 00:00:00
date0     | 1972-07-04 00:00:00
zzz       | b
str0      | FURNITURE
-[ RECORD 3 ]-------------------------
...

@dai-chen
Copy link

I'm trying to understand what's the output of ProjectOperator for example given above. Could you explain a bit or give a UT?

@acarbonetto
Copy link

Raising PoC in upstream for consideration.
opensearch-project#1982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants