Join operator lost data due to decimal join key.

### Backend

VL (Velox)

### Bug description

There are two tables: A and B. 

```
// The main schema of A
CREATE TABLE A (
  invoiceid DECIMAL(20,0),
  time TIMESTAMP,
  update_time TIMESTAMP,
  dt STRING,
  hour STRING)
USING orc
PARTITIONED BY (dt, hour)
LOCATION 'hdfs://test/A'
TBLPROPERTIES (
  'transient_lastDdlTime'='1718612895')
```

```
// The main schema of B.
CREATE TABLE B (
  `invoiceid` DECIMAL(20,0),
  `status` STRING,
  `update_time` TIMESTAMP,
  `dt` STRING,
  `hour` STRING)
USING orc
PARTITIONED BY (dt, hour)
LOCATION 'hdfs://test/B'
TBLPROPERTIES (
  'transient_lastDdlTime'='1718612913')
```

// The orc file schema in hdfs://test/A

`Type: struct<invoiceid:decimal(38,18), time:timestamp, update_time:timestamp>`

// The orc file schema in hdfs://test/B

`Type: struct<invoiceid:decimal(38,18), status:string, update_time:timestamp>`

```
// The query info.
SELECT count(*)
FROM (
   SELECT c.invoiceid AS invoice_id,
       c.orderid AS order_id,
       c.time AS order_time,
       c.update_time AS update_time
FROM
    (SELECT *
     FROM
         (SELECT *,
                 ROW_NUMBER() OVER (PARTITION BY invoiceid
                                    ORDER BY update_time DESC) AS rn
          FROM A
          WHERE dt >= '20260421') b
     WHERE rn = 1) c
JOIN
    (SELECT *
     FROM B
     WHERE dt >= '20260421'
         AND status = 7) d ON c.invoiceid = d.invoiceid
);
```

The root cause
```
Scan operator of A is Spark Scan hive due to the time and update_time are TIMESTAMP.
Scan operator of B is Gluten NativeScan hive.

Left side: Scan hive (vanilla) → DECIMAL (20,2) row → RowToVeloxColumnar → DECIMAL(20,2) columnar                                                                         
Right: NativeScan hive (Velox) → may be widen when reading ORC internally DECIMAL(38,2) columnar                                                                                   
                                                                                                                                                                      
→ Inconsistent join key types on both sides → Key matching failed in Velox hash join → 0 line output
                                                                                                                                                                      
The problem with RowToVeloxColumnarEXE itself:
When converting RowToVeloxColumnar, the Velox Vector is constructed strictly according to the DECIMAL (20,2) declared in Spark schema, without actively expanding the precision. When NativeScan reads Hive ORC, Velox will internally decode it
Type regularization, for example, DECIMAL (20,2) may be mapped to an internal precision of 128 bit representation DECIMAL (38,2) or Velox ShortDecimal/LongDecimal.
                                                                                                                                                                      
This results in a situation where the plan level type is consistent (both are DECIMAL (20,2)), but the type tags of the actual columnar vectors are inconsistent, causing the key comparison logic of Velox HashJoin to go to the type mismatch branch → join
Output 0 rows.
```

### Gluten version

Gluten-1.5

### Spark version

Spark-3.5.x

### Spark configurations

_No response_

### System information

_No response_

### Relevant logs

```bash

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join operator lost data due to decimal join key. #11980

Backend

Bug description

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Join operator lost data due to decimal join key. #11980

Description

Backend

Bug description

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions