Releases: apache/amoro
Releases · apache/amoro
v0.7.1-incubating
Improvements:
- Support graceful shutdown for AMS #3057
- Prevent the table name from being truncated prematurely #3077
- Load log4j configuration file from AMORO_CONF_DIR #3098
- Make yarn application name for flink optimizer more meaningful #3129
- Filter out the CodecPool log in the optimizer log #3145
- Optimize plan log print #3212
- Do not throw exception when retry task in optimizer keeper #3110
- Add missing optimizingStatus key and format code #3152
- Fix wrong javadoc for TableRuntimeRefreshExecutor #3051
- Fix typo in managing-optimizers.md #3059
- Equality field id are different in a RewriteFilesInput #2870
- Make plugin package option #3302
- Optimize the final tarball to keep it within the specified size for 0.7 #3303
- Change amoro shade version to 0.7.0-incubating #3083
BugFixes:
- Fix AMS crashes if javalin stop timeouts #3052
- Fix the function that convert ms #3069
- Tag dangling delete snapshot commits #3058
- High available service should not curator client before LeaderLatch closing #3132
- When a master/backup switch occurs, close the planExecutor thread pool #3141
- Thrift port automatically closed #3153
- Optimization state blocking in planning #3111
- Optimization state blocking in running #3110
- Extract error message in optimizers #3130
- Fix scheduling hungry issue #3196
- OptimizerKeeper throws NPE if optimizer timeout exactly when ams restarted #3242
- Cannot set self-optimizing.min-plan-interval table property dynamically #3224
- Iceberg table ran repeated minor optimizing #3228
- Error in docker-compose.yaml of quick start #3086
- Planning table failed, java.lang.StackOverflowError: null #3239
- Got
java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
error when execute sql in terminal #3323
v0.7.0-incubating
Highlight:
- Support automatically generating and managing Tags and Branches for tables #1354
- Support killing the running optimizing process #1862
- Support detail page for optimizing process #2114
- Support Spark-based optimizer #1812
- Add metric collection for AMS #1528
- Support managing tables with multiple formats under one Catalog #1061
Features:
- Support automatically generating and managing Tags and Branchs for Table #1354
- Support killing the running optimizing process #1862
- Support detail page for Optimizing Process #2114
- Support Spark-based optimizer #1812
- Add metric collection for ams #1528
- Support managing tables with multiple formats under one Catalog #1061
- Make local terminal cores configurable #2492
- Support reading encrypted iceberg data files #2401
- Add basic authentication to REST API #2667
Improvements:
- Improve major optimizing: compact segment files to target size #2330
- Put the database username and password into Kubernetes Secrets #2291
- Avoid releasing external optimizer instances through ams #2315
- Mixed-format Table stats for Trino engine need to be calculated individually #2246
- Add planning optimizing status for table #2290
- Support computed columns for mixed-format tables in Flink DDL #1457
- The Files page supports filtering by partition name and sorting by dictionary value. #2316
- Support log out of the ams dashboard #2359
- Support parallelized planning in one optimizer group #1951
- Skip cleaning up dangling delete files for Iceberg V1 table #2222
- Configure iceberg.worker.num-threads in the configuration file #2386
- Support sorting the external Iceberg table list #1716
- Add table filter for catalog properties #2310
- Show the format version of Iceberg Table #2260
- Make the maximum input file size for each optimizing thread configurable #2385
- The Files page supports filtering by partition name and sorting by dictionary value. #2316
- Use partition filter to speed up optimizing plan #2417
- Add more configurable properties to the ams server #2333
- Dispose of the current tasks after task deserialization error #2521
- Paimon snapshot display on the page should be arranged in reverse order according to the snapshot time of the table. #2606
- Add support for setting/altering mixed-format table properties with Flink SQL #2
- Helm adds Kyuubi (as a terminal backend) and PostgreSQL-related conf #2306
- Validate hive-site.xml when uploading a file for Hive-catalog #2520
- Optimize the speed of searching for tables in the tables navigation bar #2215
- Use CachingCatalog to reduce the time cost of IcebergCatalogWrapper#loadTable #1794
- Enable return URL parameter when redirecting to login page #2637
- Display more details on the page of Optimizing->Optimizers #2698
- Improve the version info about the project #2813
- AMS should use G1 GC by default #2830
- Support different simple authorization user names for different catalogs #2918
- Move admin username and password to Kubernetes secrets #2948
- Add Tests for helm chart templates #2778
- Support S3 storage for Paimon format catalog #2972
- Support changing the language to Chinese for Dashboard #2958
BugFixes:
- After disabled self-optimizing, the table status is not updated to idle #2250
- Optimizing commit failed: org.apache.iceberg.exceptions.ValidationException: Missing required files to delete #2454
- Periodic snapshots expiring tasks will be scheduled redundantly #2453
- When using Flink API to write data into a mixed-iceberg table, it will update the table metadata to incorrect value #2525
- Optimizing not trigged when partition value is null #2542
- When expiring historical data, the latest snapshot generated by amoro should not be used #2555
- The process of deleting expired snapshots was interrupted by exception #2579
- Loading Mixed-Iceberg format tables of internal catalog failed in Flink #2587
- Failed to remove orphan files after dropping a partition field #2617
- The
Last Commit Time
for the Paimon partitioned table in theFile
pages has the wrong zone #2607 - If an exception occurs in externalCatalog.listTables during the exploreExternalCatalog process, exploreExternalCatalog will fail this time. #2638
- Failed to view snapshot details #2614
- Hive table name may be too long for AMS table identifiers #2649
- Skip the deleted directories when cleaning orphan files #2619
- Optimizing process stuck after a few failures #2623
- Lack of database type mandatory checking, leading to NPE in existing MySQL config check #2733
- Iceberg temporary snapshot tables should be excluded from data cleanups #2488
- Filformat ORC does not support rolling new files for mixed_format #2560
Amoro-0.6.1
Highlight:
- Improve the performance of the optimizer in cases where there are a large number of delete files
Improvements:
- Improve optimizer's compaction performance in large delete scenarios #2266
- Store max transaction id on the snapshot level for Mixed Format Table #1720
- Reduce triggering of optimizating evaluation for native iceberg format #2350
- The mixed hive tables should be allowed to be dropped when the hive table has already been dropped #2286
- Print GC date stamps #2416
- Exclude kryo dependency from flink-optimizer #2437
BugFixes:
- Fix data lost issue after optimizing for Mixed format tables #2253
- After disabling self-optimizing, the table status is not updated to idle #2250
- Optimizing status blocked at minor/major when restarting AMS #2279
- Exception happened when using Flink DataStream API to write data into Mixed format tables #2271
- Failure when creating a table and immediately retrieving Blocker #2289
- Fix deleting Puffin files by mistake when cleaning orphan files #2320
- There is a not daemon thread exists when the Spark job finished #2336
SimpleDateFormat
throws parsing errors when using multi-threading #2324- Dashboard displays wrong partition path when the table had multiple partition spec #1863
- Fix the NullPointerException issue when modifying the
self-optimizing.enabled
configuration #2388 - TableController set partition string value to
null
when getting file list for a non-partitioned table #2407 - Fix loading the optimizing snapshot id of ChangeStore for Mixed Format tables#2430
- Batch deletion of ChangeStore files did not take effect for the Mixed Format tables #2440
- Fix File format is required when scanning file entries if no file format is specified in the file name #2441
- Mixed Hive tables mistakenly delete hive files during expiring snapshots #2404
- Using the wrong database name when switching to another Catalog #2413
- Fix Mixed Format tables expiring all the snapshots with optimized sequence #2394
- Add a default value of table-filters in spark scan builder to avoid NPE when comparing the table scan #2313
- Failed to get table properties when BaseStore contains property
watermark.base
#2295 - Show create table statement return incorrect SQL with Spark #2272
- AMS will repeat trigger the same minor optimizing jobs when the
bucket > self-optimizing.minor.trigger.file-count
#2464 - Periodic snapshots expiring tasks will be scheduled redundantly #2453
Amoro-0.6.0
Highlight:
- Kubernetes Integration
- Support running AMS in a Kubernetes environment
- Support running Optimizer in a Kubernetes environment
- Partition expiration
- Paimon integration
- Support viewing metadata information such as Schema, Properties, Files, Snapshots, Compactions, Operations of the Paimon table
- Support executing Spark SQL supported by Paimon in the Terminal interface
- Mixed format support ORC file format
- Mixed format support flink-1.16 and flink-1.17
Features:
- Integration with Kubernetes #1917
- Support for expiring partitions in Iceberg tables #1758
- Support integration with Apache Paimon #1269
- Support ORC file format for Mixed format tables #740
- Upgrading Flink version based on Mixed-format table #1983
- Enrich the storage type registry when creating a catalog. #2007
Improvements:
- Support s3:// and s3a:// protocols in the default distribution. #2009
- Make health status checks available without logging in #2081
- Separate local-optimizer and flink-optimizer into different modules and package them independently. #2008
- Support list partitions via flink catalog #1940
- Add conf/env.sh for jvm options configurations #2047
- Change position delete index from set to bitmap of optimizing #2078
- Critical vulnerabilities in the project dependencies need to be fixed #2040
- Make AMS table sync with external catalogs multithreaded #2109
- Change the default value of self-optimizing.major.trigger.duplicate-ratio #2225
- Log4j 1.2 is a SocketServer class that is vulnerable to deserialization of untrusted data #1548
- Fix the format of thread name in DefaultTableService #2163
BugFixes:
- Failed to optimize Mixed Format KeyedTable with Timestamp(without zone) column as Primary key #2091
- An infinite loop occurs in merge optimization, and the number of input files is equal to the number of output files. #2090
- Get incorrect partition field name when refresh tables #2165
- Catalog type always null in TerminalSession #2146
- ams server can not initialized on HA mode #2083
- Delete sync tables when drop external catalogs #2235
- Data lost after optimizing in Mixed format #2253
Amoro-0.5.1
Changelog for v0.5.1
Features:
- [Feature]: Show summary details of snapshots in the table transaction tab #1827
- [Subtask]: Implement displaying metrics on the front end of AMS #1771
- [Feature]: Support table filters to make AMS ignore tables that are not needed #1870
- [Feature]: Support MOR with Flink Engine in SQL/Table API #1422
- [Feature]: Add PostgresSQL RDS as system database #1966
- [Subtask]: Support creating mixed-iceberg format tables in any catalog which support iceberg. #1336
Imporvements:
- [Improvement]: modify the parameter transfer method for flink optimize #1789
- [Subtask]: Remove IcebergContentFile and related codes #1839
- [Improvement]: Extract Spark Common Module to reuse code between different spark versions. #1449
- [Improvement]: Show correct resource occupation for external optimizer #1845
- [Improvement] Polish optimizing executor code #1895
- [Improvement]: Remove AmsClient in UnkeyedTable and KeyedTable #1904
- [Improvement]: add mysql schema init #1899
- [Improvement]: Perform file filtering as early as possible when during optimizing plan process #1883
- [Improvement] Add bounds of timestamp type for mixed-hive table #1920
- [Improvement]: Flink Logstore supports 'group-offsets' and 'specific-offsets' Startup mode. #1931
- [Improvement]: Using the parameters in TableConfiguration instead of from Table properties #1943
- [Improvement]: Improve the triggering conditions for segment file rewriting #1953
- [Improvement]: Helm Chart Support Overwrite JVM Optional #1962
- [Subtask]: Do not cache DataFile in memory when evaluating tables. #1840
- [Improvement]: Introduce a table property to set the task splitting size of self-optimizing #1935
- doc: update dashboard readme #1984
- [Improvement]: Flink optimizer configure managed memory explicitly in both the doc and the startup script #1907
- [Improvement]: CI run check style first #1293
- [Improvement]: Add basic configration of spotless #1926
- [Improvement]: The version of the Amoro in the Docker startup script should be the latest by default #2010
- support running ams in jdk17 environment. #1896
BugFixs:
- [Bug]: the historical metadata file has not been deleted #1864
- [Bug]: Segment file first binpack strategy leads to inability to optimize SplitTasks #1866
- [Bug]: The water mark is not displayed properly for tables without primary keys #1603
- [Bug]: Fix unit conversion for parameters -msz to spillmap configuration #1893
- [Bug]: Snapshot file not found after Flink ingestion job has been running for a while #1882
- [Bug]: BasicUnkeyedTable does not Override the name method #1914
- [Bug]: Can not set configuration when setting UGI #1912
- [Bug]: If the optimizing of a Mixed Hive table fails to alter the Hive location, the files will be moved to the old Hive location. #1898
- [Bug]: MAJOR Optimizing is running repeatedly #1924
- [Bug]: The optimizing task has been in a suspended state all along. #1939
- [Bug]: Failed to persist optimizing process records #1869
- [Bug]: Mix-iceberg format with hive metastore not working on spark #1963
- [Bug]: Mix-iceberg format with hive metastore not working on spark session catalog #1964
- [Bug]:When using an HDFS cluster protected by Kerberos, the fs.hdfs.impl.disable.cache=false configuration is invalid, FileSystem is unable to hit the cache #1857
- [Bug]: When using s3a schema, orphan files could not be deleted. #1981
- [Bug]: Using mix iceberg in spark has some docs problems #1908
- [Bug]: Avoid NPE during disposing ArcticServiceContainer #1985
- [Bug]: Orphan files clean mistakenly deleted the Iceberg statistic file #1959
- [Bug]: If the iceberg table has a primary key of date type, in upsert mode, after Amoro merge, the primary key duplicate data can be found #1979
- [Bug]: If the projected schema of the Lookup Table Source doesn't include all primary keys and requires automatic addition #1891
- [Bug]: ams-optimizer FLINK job expired oom due to frequent requests for Kerberos. #1980
- [Bug]: Terminal is unable to create the iceberg table, when Kerberos authentication is enabled #1996
- [Bug] Trino failed to query Amoro Mixed Format Table after restarting AMS #1977
- [Bug]: The startup script has a judgment error #2006
- [Bug]: Spark SQL using amoro spark runtime error class cannot find ResolveProcedures #2032
- [Bug]: Dashboard can't show files page with kerberos and mixed-iceberg format in external catalog #2029
- [Bug]: When a time value that is too large or too small is written, it will encounter a 'long overflow' exception. #2044
- [Bug]: Ams Server always executes initSqlScript when using mysql8 #2049
Amoro-0.5.0
Changelog for v0.5.0
Features:
- [Feature]: Support Spark 3.3 for Mixed format #261
- [Feature]: Support Spark 3.2 for Mixed format #1383
- [Feature]: Support MOR with Flink Engine in runtime batch mode for Mixed format #5
- [Feature]: Support Iceberg on S3 or other non-hadoop storage system. #1476
- [Feature]: Add self-optimizing detail in optimized tab of table #1535
- [Feature]: Support to manage optimizer groups in dashboard #1537
- [Feature]: Remove the Independent DeleteFiles for the Iceberg Format Table #1628
- [Feature]: Support lookup join on Mixed format tables with Flink #1322
- [Subtask]: Make AMS as an Iceberg Catalog provider via implement RestCatalogAPI #1339
Imporvements:
- [Improvement]: AMS module refactoring in 0.5 #1372
- [Improvement]: Add new task allocation rules to make the number of concurrency and read cost more balanced #1311
- [Improvement]: Improve query speed for Mixed tables in Spark #1349
- [Spark][Improvement]: Add scala code style check for spark module #1059
- [Improvement][Spark]: Speed up commit process for Mixed Hive format tables #1350
- [Subtask]: Introdue the concept of the InternalCatalog and ExternalCatalog #1544
- [Improvement]: Optimize the description of Spark CreateTableLike #1340
- [Improvement]: Improve the project documentation #1510
BugFixs:
- [Bug]: The position-delete written by iceberg directly through the base table cannot take effect when read by Arctic #1347
- [Bug]: When AMS is deployed HA, use Kerberos to access non-Kerberos zookeeper will throw an exception #1346
- [Bug]: High CPU occupancy of AMS #1332
- [Bug]: jacoco-maven-plugin report EOFException in Flink Module CI #1330
- [Bug]:[Spark] select failed after update on keyed table for kerberos auth failed. #488
- [Bug]: The column ‘_transaction_id’ and '_change_action' will be null when querying change table with spark #1385
- [Bug]: Hive function can't be used when using ArcticSparkSessionCatalog under spark 3.3 #1397
- [Bug]: Find duplicate records when enable rocksdb map #1751
- [Bug]: when partition value contains spaces, optimize can't process correctly #1653
- [Bug]: Build failed when change hadoop version to 3.2.1 #1646
Arctic-0.4.1
Changelog for v0.4.1
Features:
- [Feature]: Self-Optimizing scan files from metadata instead of from file info cache #1093
- [Subtask][Flink]: support pulsar read and write without consistency in Flink 1.12 #1007
- [Feature]: A new design of resolving data conflicts without relying on AMS to generate TransactionId #994
- [Feature]: Introduce a mechanism for concurrency control between Writing and Optimizing #985
- [Feature][Spark]: Support drop partition #918
- [Feature][SPARK]: Support truncate table #540
- [Feature][Spark]: Support Merge Into for Spark3.x #395
Imporvements:
- [Improvement]: Self-Optimizing for Mixed Format tables should limit the file cnt for each Optimizing #1213
- [Improvement][AMS]: Automatic retry optimizing if commit failed #1103
- [Improvement]: Replace the keyword base to express the meaning of the lowest part of sth #1082
- [Improvement][AMS]: Support set login user and login password in config yaml file #1081
- [Improvement][AMS]:Optimize settings page and terminal page #1005
- [Improvement][Flink]: Refactor Log-store Source to FLINK FLIP-27 API #969
- [Improvement][Flink]: Support all logstore configuration items to be configured in table properties #933
- [Improvement]: More elegant display of error messages in terminal #913
BugFixs:
- [ARCTIC-1025][FLINK] Fix data duplication even if there is a primary key table with upsert enabled #1180
- [Bug][Spark]: The orphan files cleanup of insert overwrite doesn't take effect for un-partitioned table. #1174
- [Bug]: When reading partial fields from Logstore, the number of fields does not match #1171
- [Bug]: Address the deserializing exception of the array type in the logstore #1111
- [Bug][Core]: The PartitionPropertiesUpdate can't remove partition properties key #1107
- [BUG]: The expire snapshot and the orphan file clean scan related files from metadata #1105
- [BUG][AMS]: Automatic retry optimizing if commit failed #1103
- [Bug]: Browser tab does not display Arctic's icon #1091
- [Bug]: When the Metastore type is Hadoop, Terminal executes the spark sql without loading the configuration file #1090
- [Bug][Spark]: The read and write authentication user are different in Spark when using mixed Iceberg format #1069
- [Bug][Spark]: Insert overwrite select from view will throw exception #1066
- [Bug]: When the flink jobs reads the arctic table for a while, the flink job fails #1063
- [Bug]: Spark reads the timestamp field eight hours longer than the actual Flink engine writes timestamp #1062
- [Bug]:
KeyedTableScanTask
confuse files fromBaseStore
andChangeStore
#1045 - [Bug][Spark]: Create Table As Select should write to the base store. #1026
- [Bug]: When using spark query type timestamp column failed #978
- [Bug]: Flink sets watermark on Arctic table fields, but ArrayIndexOutOfBoundsException occurs when reading data #957
- [Bug][Spark]: The data are written repeatedly after the Spark Executor failover #917
- [Bug]: Spark refresh table error #620
- [Bug]: Spark batch write failed w/ Already closed files for partition #613
- [Bug][Flink]: reverse message order when retract message from message queue. #482
Arctic-0.4.0
Changelog for Arctic-0.4.0
New Features:
- [Feature]: Support managing iceberg tables already existed #260
- [Feature][Spark]: Standard auth utils method for Kyuubi #582
- [Feature][AMS]: Show table info for native iceberg in AMS dashboard #531
- [Feature][Spark]: Support duplicate-key-check when insert into and insert overwrite #484
- [Feature][AMS]: Support add/modify by AMS Dashboard #412
- [Feature][AMS]: Add setting page in AMS dashboard #407
- [Feature]: Support table watermark to determine table freshness #394
- [Feature]: AMS terminal uses Kyuubi as its backend SQL engine #262
Improvement:
- [Improvement]: Hidden username and password in setting page #630
- [Improvement]: Fix file upload when using derby as storage #622
- [Improvement]: Expose arctic benchmark code and docker environment file #265
- [Improvement]: Do some initialization when adding catalog #617
- [Improvement]: Identify Hadoop and Hive version Arctic had supported #356
- [Improvement]: Upgrade iceberg version to 0.13 or 0.14 #397
- [Improvement]: Update the documentation in the Flink DML section #598
- [Improvement][AMS]: Improve the high concurrency processing capability of the allocateTransactionId interface #393
- [ARCTIC-557][FLINK] Doc catalog authorization configs #570
- [Optimize] Risk of AMS memory usage and task execute timeout when processing a large number of small files #128
Bugfix:
- [Bug]: Some bugs in the local test for version 0.4.0 bug #626
- [Bug]: Including a line-through in the arctic HA property causes the catalog to load failed #515
- [Bug][Flink]: Read duplicated data from a keyed table, even though the upsert mode is enabled. #592
- [Bug]: Flink 1.15 quickstart failed for java.lang.ClassNotFoundException #563
- [ARCTIC-561] When the flink job fails, the flink job will commit a wrong transaction id #568
- [Bug]: For the Arctic keyed table, the out-of-order TransactionId lead to inconsistent data #479
- [Bug][Flink]: Watermark may be lost if split read too fast. #486
Arctic-0.3.2-rc1
Changelog for Arctic-0.3.1-rc1
New Features:
Improvement:
- Unify the
scan.startup.mode
config values for both Logstore and Filestore. #441 - Add initialization and uprade SQL scripts for every version. #362
- Check data lost after optimize. #336
- Check if Hive table can be upgrad before upgrading. #312
- Supprt MySQL 8.0 as system database for AMS. #48
Bugfix:
- Changestore's schema may be not correct after adding new columns for Hive table. #497
- Reading from Filestore may failed for empty table. #475
- Optimize may be blocked after commiting failed. #464
- Table list may be not correct with multiple Hive catalogs. #402
- Cannot create Hive tables with uppercase field names. #380
- Cannot uprade Hive tables with multiple partition columns. #352
- Inserting overwrite into Hive table may failed in Spark. #334
Arctic-0.3.1-rc1
Changelog for Arctic-0.3.1-rc1
New Features:
- Support hive table. #38
- Arctic table support real-time dimension table join with Flink. #94
- Support Flink version 1.15. #166
- AMS support high available mode. #117
Improvement:
- Support create table like with Spark. #123
- AMS support list table DDL operations. #116
- AMS support tables and database list filter. #115
- AMS list transactions witch user commit only. #86
Bugfix: