You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Decimal32 and Decimal64 support to arrow-avro Reader (#8255)
# Which issue does this PR close?
- Part of #4886
# Rationale for this change
Apache Avro’s `decimal` logical type annotates either `bytes` or `fixed`
and carries `precision` and `scale`. Implementations should reject
invalid combinations such as `scale > precision`, and the underlying
bytes are the two’s‑complement big‑endian representation of the unscaled
integer. On the Arrow side, Rust now exposes first‑class `Decimal32`,
`Decimal64`, `Decimal128`, and `Decimal256` data types with documented
maximum precisions (9, 18, 38, 76 respectively). Until now, `arrow-avro`
decoded all Avro decimals to 128/256‑bit Arrow decimals, even when a
narrower type would suffice.
# What changes are included in this PR?
**`arrow-avro/src/codec.rs`**
* Map `Codec::Decimal(precision, scale, _size)` to Arrow’s
`Decimal32`/`64`/`128`/`256` **by precision**, preferring the narrowest
type (≤9→32, ≤18→64, ≤38→128, otherwise 256).
* Strengthen decimal attribute parsing:
* Error if `scale > precision`.
* Error if `precision` exceeds Arrow’s maximum (Decimal256).
* If Avro uses `fixed`, check that declared `precision` fits the byte
width (≤4→max 9, ≤8→18, ≤16→38, ≤32→76).
* Update docstring of `Codec::Decimal` to mention `Decimal32`/`64`.
**`arrow-avro/src/reader/record.rs`**
* Add `Decoder::Decimal32` and `Decoder::Decimal64` variants with
corresponding builders (`Decimal32Builder`, `Decimal64Builder`).
* Builder selection:
* If Avro uses **fixed**: choose by size (≤4→Decimal32, ≤8→Decimal64,
≤16→Decimal128, ≤32→Decimal256).
* If Avro uses **bytes**: choose by declared precision (≤9/≤18/≤38/≤76).
* Implement decode paths that sign‑extend Avro’s two’s‑complement
payload to 4/8 bytes and append values to the new builders; update
`append_null`/`flush` for 32/64‑bit decimals.
**`arrow-avro/src/reader/mod.rs` (tests)**
* Expand `test_decimal` to assert that:
* bytes‑backed decimals with precision 4 map to `Decimal32`; precision
10 map to `Decimal64`;
* legacy fixed\[8] decimals map to `Decimal64`;
* fixed\[16] decimals map to `Decimal128`.
* Add a nulls path test for bytes‑backed `Decimal32`.
# Are these changes tested?
Yes. Unit tests under `arrow-avro/src/reader/mod.rs` construct expected
`Decimal32Array`/`Decimal64Array`/`Decimal128Array` with
`with_precision_and_scale`, and compare against batches decoded from
Avro files (including legacy fixed and bytes‑backed cases). The tests
also exercise small batch sizes to cover buffering paths; a new Avro
data file is added for higher‑width decimals.
New Avro test file details:
- test/data/int256_decimal.avro # bytes logicalType:
decimal(precision=76, scale=10)
- test/data/fixed256_decimal.avro # fixed[32] logicalType:
decimal(precision=76, scale=10)
- test/data/fixed_length_decimal_legacy_32.avro # fixed[4] logicalType:
decimal(precision=9, scale=2)
- test/data/int128_decimal.avro # bytes logicalType:
decimal(precision=38, scale=2)
These new Avro test files were created using this script:
https://gist.github.com/jecsand838/3890349bdb33082a3e8fdcae3257eef7
There is also an arrow-testing PR for these new files:
apache/arrow-testing#112
# Are there any user-facing changes?
N/A due to `arrow-avro` not being public.
0 commit comments