-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Open
Labels
Description
The C++ (Velox) and Java TPCH implementations generate different random strings for varchar columns, making exact data comparison impossible.
Your Environment
- Presto version used: Latest
- Storage (HDFS/S3/GCS..): N/A
- Data source and connector used: TPCH
- Deployment (Cloud or On-prem): N/A
- Pastebin link to the complete debug logs: N/A
Expected Behavior
The data should match between both connectors.
Current Behavior
The data is different.
Possible Solution
Fix the TPCH Velox connector to match the Java implementation.
Steps to Reproduce
-- Java
SELECT custkey, address FROM tpch.tiny.customer WHERE custkey < 10 ORDER BY custkey;
custkey | address
---------+---------------------------------------
1 | IVhzIApeRb ot,c,E
2 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak
3 | MG9kdTD2WBHm
4 | XxVSJsLAGtn
5 | KvpyuHCplrB84WgAiGV6sYpZq7Tj
6 | sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn
7 | TcGe5gaZNgVePxU5kRrvXBfkasDTea
8 | I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5
9 | xKiAFTjUsCuxfeleNqefumTrjS
-- C++
SELECT c_custkey, c_address FROM tpchstandard.tiny.customer WHERE c_custkey < 10 ORDER BY c_custkey;
c_custkey | c_address
-----------+---------------------------------------
1 | j5JsirBM9PsCy0O1m
2 | 487LW1dovn6Q4dMVymKwwLE9OKf3QG
3 | fkRGN8nY4pkE
4 | 4u58h fqkyE
5 | hwBtxkoBF qSW4KrIk5U 2B1AU7H
6 | g1s,pzDenUEBW3O,2 pxu0f9n2g64rJrt5E
7 | 8OkMVLQ1dK6Mbu6WG9 w4pLGQ n7MQ
8 | j,pZ,Qp,qtFEo0r0c 92qobZtlhSuOqbE4JGV
9 | vgIql8H6zoyuLMFNdAMLyE7 H9
Screenshots (if appropriate)
Context
I tried to write some tests that compare the results between the two connectors and encountered this issue.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Backlog
Status
🆕 Unprioritized