Skip to content

C++ and Java TPCH Implementations Generate Different Random Data #26051

@tdcmeehan

Description

@tdcmeehan

The C++ (Velox) and Java TPCH implementations generate different random strings for varchar columns, making exact data comparison impossible.

Your Environment

  • Presto version used: Latest
  • Storage (HDFS/S3/GCS..): N/A
  • Data source and connector used: TPCH
  • Deployment (Cloud or On-prem): N/A
  • Pastebin link to the complete debug logs: N/A

Expected Behavior

The data should match between both connectors.

Current Behavior

The data is different.

Possible Solution

Fix the TPCH Velox connector to match the Java implementation.

Steps to Reproduce

-- Java
  SELECT custkey, address FROM tpch.tiny.customer WHERE custkey < 10 ORDER BY custkey;
   custkey |                address
  ---------+---------------------------------------
         1 | IVhzIApeRb ot,c,E
         2 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak
         3 | MG9kdTD2WBHm
         4 | XxVSJsLAGtn
         5 | KvpyuHCplrB84WgAiGV6sYpZq7Tj
         6 | sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn
         7 | TcGe5gaZNgVePxU5kRrvXBfkasDTea
         8 | I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5
         9 | xKiAFTjUsCuxfeleNqefumTrjS
  -- C++
  SELECT c_custkey, c_address FROM tpchstandard.tiny.customer WHERE c_custkey < 10 ORDER BY c_custkey;
   c_custkey |               c_address
  -----------+---------------------------------------
           1 | j5JsirBM9PsCy0O1m
           2 | 487LW1dovn6Q4dMVymKwwLE9OKf3QG
           3 | fkRGN8nY4pkE
           4 | 4u58h fqkyE
           5 | hwBtxkoBF qSW4KrIk5U 2B1AU7H
           6 |  g1s,pzDenUEBW3O,2 pxu0f9n2g64rJrt5E
           7 | 8OkMVLQ1dK6Mbu6WG9 w4pLGQ n7MQ
           8 | j,pZ,Qp,qtFEo0r0c 92qobZtlhSuOqbE4JGV
           9 | vgIql8H6zoyuLMFNdAMLyE7 H9

Screenshots (if appropriate)

Context

I tried to write some tests that compare the results between the two connectors and encountered this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Backlog

    Status

    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions