Skip to content

Commit 1aecd9c

Browse files
richoxzhangli20
andauthored
update blaze version 2.0.8-SNAPSHOT (#386)
Co-authored-by: zhangli20 <[email protected]>
1 parent 591b8d7 commit 1aecd9c

10 files changed

+184
-172
lines changed

.github/workflows/build-ce7-releases.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
strategy:
1313
matrix:
1414
sparkver: [spark303, spark333]
15-
blazever: [2.0.7]
15+
blazever: [2.0.8]
1616

1717
steps:
1818
- uses: actions/checkout@v4

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,16 +108,16 @@ spark-sql -f tpcds/q01.sql
108108

109109
## Performance
110110

111-
Check [Benchmark Results](./benchmark-results/20231108.md) with the latest date for the performance
111+
Check [Benchmark Results](./benchmark-results/20240202.md) with the latest date for the performance
112112
comparison with vanilla Spark on TPC-DS 1TB dataset. The benchmark result shows that Blaze saved
113113
~40% query time and ~45% cluster resources in average. ~5x performance achieved for the best case (q06).
114114
Stay tuned and join us for more upcoming thrilling numbers.
115115

116116
Query time:
117-
![20231108-query-time](./benchmark-results/blaze-query-time-comparison-20231108.png)
117+
![20240202-query-time](./benchmark-results/blaze-query-time-comparison-20240202.png)
118118

119119
Cluster resources:
120-
![20231108-resources](./benchmark-results/blaze-cluster-resources-cost-comparison-20231108.png)
120+
![20240202-resources](./benchmark-results/blaze-cluster-resources-cost-comparison-20240202.png)
121121

122122
We also encourage you to benchmark Blaze and share the results with us. 🤗
123123

RELEASES.md

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,32 @@
1-
# blaze-v2.0.7
1+
# blaze-v2.0.8
22

33
## Features
4-
* Supports native BroadcastNestedLoopJoinExec.
5-
* Supports multithread UDF evaluation.
6-
* Supports spark.files.ignoreCorruptFiles.
7-
* Supports input batch statistics.
8-
4+
* Enables nested complex data types by default.
5+
* Supports writing parquet table with dynamic partitions.
6+
* Supports partial aggregate skipping.
7+
* Enable first() aggregate function converting.
8+
* Add spill metrics.
9+
*
910
## Performance
10-
* Improves get_json_object() performance by reducing duplicated json parsing.
11-
* Improves parquet reading performance by skipping utf-8 validation.
12-
* Supports cached expression evaluator in native AggExec.
13-
* Supports column pruning during native evaluation.
14-
* Prefer native sort even if child is non-native.
11+
* Implement batch updating/merging in aggregates.
12+
* Use slim box for storing bytes.
13+
* get_json_object use Cow to avoid copying.
14+
* Reduce the probability of unexpected off-heap memory overflows.
15+
* Introduce multiway merge sort to SortExec and SortRepartitioner.
16+
* SortExec removes redundant columns from batch.
17+
* Implement loser tree with inlined comparable traits.
18+
* Use unchecked index in LoserTree to get slightly performance improvement.
19+
* Remove BucketRepartitioner.
20+
* Reduce number of awaits in sort-merge join.
21+
* Pre-merge records in sorting mode if cardinality is low.
22+
* Use gxhash as default hasher in AggExec.
23+
* Optimize collect_set/collect_list function with SmallVec.
24+
* Implement async ipc reader.
1525

1626
## Bugfix
17-
* Fix missing outputPartitioning in NativeParquetExec.
18-
* Fix missing native converting checks in parquet scan.
19-
* Fix inconsistency: implement spark-compatible float to int casting.
20-
* Avoid closing hadoop fs for reusing in cache.
27+
* Fix buggy GetArrayItem/GetMapValue native converter pattern matching.
28+
* Fix parquet pruning with NaN values.
29+
* Fix map type conversion with incorrect nullable value.
30+
* Fix ffi-export error in some cases.
31+
* Fix incorrect behavior of get_index_field with incorrect number of rows.
32+
* Fix task hanging in some cases with ffi-export.

benchmark-results/20231108.md

Lines changed: 0 additions & 152 deletions
This file was deleted.

0 commit comments

Comments
 (0)