-
Notifications
You must be signed in to change notification settings - Fork 37
v12 merge commit message
Merge with PostgreSQL version 12 (up to a point between beta2 and beta3).
This is the point where PostgreSQL REL_12_STABLE was branched off the master branch and v13 development started.
In addition to the changes below, there's a ton of little things marked with GPDB_12_MERGE_FIXME comments.
-
Big changes to partition planning and pruning, as we merged upstream partititioning code. Most GPDB partitioning code has been replaced with corresponding upstream. See section "Partitioning" for details.
-
PostgreSQL v12 planner learned to inline side-effect free CTEs. That is, if you "WITH foo (SELECT ...) SELECT * FROM foo, ...", the 'foo' subquery is inlined into the main query and planned as one. Previously, the CTE was always planned separately, and materialized in full. GPDB has always supported inlining CTEs, so this isn't new to GPDB, but we adopted the code changes related to that as much as possible. In PostgreSQL, the feature can be controlled by specifying "WITH foo AS MATERIALIZED (...)" or NOT MATERIALIZED, that now works in GPDB too.
In GPDB, the inlining could previously be controlled with the "gp_cte_sharing" GUC. It still works, but if MATERIALIZED or NOT MATERIALIZED was specified in the query it takes precedence. Like in previous releases, Even if gp_cte_sharing is set to 'on', the subquery is inlined if there is only one reference to it in the query.
When gp_cte_sharing = 'off' (the default), GPDB inlines the subqueries even if it's not strictly legal to do so. See https://github.com/greenplum-db/gpdb/issues/1349. This merge doesn't change that.
-
Partitionwise joins were and aggregates were introduced in PostgreSQL v11. They work in GPDB too. The main benefit from partition aggregates is that if a partition hierarchy contains a foreign table, the aggregation can be pushed down to the foreign server.
-
GROUPING SETS can now be implemented by hashing, or a mix of hashing and sorting. It shows up as a MixedAggregate node in EXPLAIN. However, the GPDB-specific multi-stage aggregate planning code (in cdbgroupingpaths.c) has not been updated to take advantage of that, so you only get such plans with one-stage aggregates, when the grouping keys happen to coincide with the table's distribution keys. This is a TODO.
-
The default logging directory, 'pg_log', was renamed to just 'log' in PostgreSQL v10. That might affect tools that would try to look into the logs. (The 'pg_xlog' and 'pg_clog' directories were also renamed to 'pg_wal' and 'pg_xact', respectively, but external tools shouldn't be peeking into those directories directly)
-
Is log_line_prefix resurrected? (a3dd474470f35f87a69c11d8e292923389813cf8)
-
JIT compilation, introduced in PostgreSQL v11, works. We haven't done anything extra for JIT compilation of GPDB-specific code, but everything that evaluates expressions benefits from the upstream JIT support, also in GPDB-specific node types.
-
Hash indexes are enabled now. We had disabled them completely in GPDB, because they were not WAL-logged. Since PostgreSQL v10 they are, so we can enable them.
-
"MK sort", in tuplesort_mk.c, is gone. The "gp_enable_mk_sort" GUC is gone too. We always use the upstream sort code now. There have been performance improvements to it in the upstream releases, but this is still a performance regression in cases where the MK-sort was particularly good. That is, if you have multiple ORDER BY columns, with lots of duplicates. We might still reintroduce MK sort later in some form, as a separate PR. But that work should be made on pgsql-hackers for PostgreSQL first.
-
The GPDB-specific "Hybrid Hash Agg" code is gone. Its purpose was to support spilling large hash aggregates to disk, instead of running out of memory. That functionality has been replaced by backporting the functionally similar hash agg spilling feature from PostgreSQL v13. That consisted of a number of upstream commits:
a10701c0e3 Allow simplehash to use already-calculated hash values. 81080df32b Add transparent block-level memory accounting 87b6b57191 Fix edge case leading to agg transitions skipping ExecAggTransReparent() calls. f76e8f8507 Refactor hash_agg_entry_size(). 4ac4bd07fa Introduce TupleHashTableHash() and LookupTupleHashEntryHash(). 5dddc85d78 Change signature of TupleHashTableHash(). d238458f6d Minor refactor of nodeAgg.c. a9e9e9bc8e Fixup for nodeAgg.c refactor. b27eec5ea6 Extend ExecBuildAggTrans() to support a NULL pointer check. 0a390a8e8b Introduce LogicalTapeSetExtend(). 5d9fd3ae3d Disk-based Hash Aggregation. 5471822147 Fixes for Disk-based Hash Aggregation. 8b6151eae7 Avoid allocating unnecessary zero-sized array. 04fced3688 Fix costing for disk-based hash aggregation. e4ee460db3 Include chunk overhead in hash table entry size estimate. 255871265e Create memory context for HashAgg with a reasonable maxBlockSize. 02f3d5670a Fix HashAgg regression from choosing too many initial buckets. f12207ca4e Logical Tape Set: use min heap for freelist.
As a result, the nodeAgg.c code is more similar to the v13 version than v12.
TODO: backpatch followup commits from REL_13_STABLE
-
TODO: createlang and droplang binaries were removed in PostgreSQL v10. Update GPDB docs.
-
Autoconf changes from upstream for building C++ code?? FIXME
-
Parallelism is still disabled in GPDB. That includes the new parallel CREATE INDEX code.
-
Fix behavior with UPDATE WHERE CURRENT OF. As you can see in the changes to expected output of the 'portals_updatable' test, some of the test results changed. As far as I can see, the old behaviour memorized in the expected output was incorrect. PostgreSQL has always returned the result we're now seeing on the merge branch, which is different from the result you get on master. The test had not been modified since 2012, when the test was added along with support for WHERE CURRENT OF. I could not find any explanation in git history or in test comments. I believe it was simply a bug when WHERE CURRENT OF support was added, and no one noticed that the test result was wrong.
-
SRFs in target lists. We adoped the new upstream node type, ProjectSet, to deal with SRFs in target lists. As a performance optimization, there were some GPDB changes in the planner earlier, to always use a Result node if there were SRFs in the target list, and the SRF-in-targetlist support was removed from other executor nodes. That was pretty much the same changes that were made in upstream, except that upstream uses ProjectSet instead of Result, and the behavior was changed to be more sane when you had multiple SRFs. Replace all the GPDB changes with upstream. Due to the behavior changes with multiple SRFs, some INSERT statements in GPDB tests had to modified so that they generate the same test data as before.
-
The GPDB implementation of "RANGE offset PRECEDING/FOLLOWING" in window functions was replaced with the upstream implementation that was introduced in version 11. The upstream implementation is more strict about the datatypes than the code in GPDB master. However, looking at old JIRA ticket MPP-5246 that was mentioned in one of the tests, it had a test case like this attached to it:
explain select cn, count(*) over (order by dt range between '2 day'::interval preceding and 2 preceding) from mpp5246_sale; ERROR: can't coerce leading frame bound to type of trailing frame bound HINT: specify the leading and trailing frame bounds as the same type
So apparently we were strict about the types back in 2009, and you got that error up to 5X_STABLE, but it got relaxed in 6X_STABLE. And now it becomes more strict with the upstream code again. I think that's fine, although we probably shouldn't have relaxed it in 6X_STABLE. Too late to change that.
-
Remove the undocumented 'gp_indexcheck_vacuum' GUC. A new contrib module, contrib/amcheck, was added in PostgreSQL v10, with similar functionality.
-
Logical replication is still disabled in GPDB.
-
The default of 'hot_standby' was changed to 'on' in upstream, but hot standby is still not supported GPDB, and the default in GPDB remains 'off'. It is now possible to turn it 'on' manually in mirrors, though. That allows tools like pg_basebackup to work, and it allows running the upstream replication tests in src/test/recovery. However, there's no user-visible, supported, new functionality here.
-
pg_basebackup got a feature to verify checksums, but that has been disabled in the wrapper python scripts. So checksums are verified by the management utilities that use pg_basebackup under the hood (gpexpand, gpinitstandby, gpconfigurenewsegment. Likewise, tests that use pg_basebackup were modified with the --no-verify-checkums flag. The reason for this is that the checksum verification is broken for AO/AOCO tables, which don't have page checksum like other relations do, and pg_basebackup cannot distinguish them. We ought to fix this somehow, but for now it's a FIXME.
-
pg_ctl -w now waits for distributed transaction recovery.
When starting master in distributed mode, in GPDB pg_ctl -w should wait for distributed transaction recovery to complete. As only after that connections are allowed.
Adding new state "dtmready" to be recorded in PID file to mark distributed transaction recovery completion. When dtx background worker exits successfully, postmaster writes this state to file.
pg_ctl waits for this "dtmready" state instead of "ready" state is starting master in distribted mode which is conveyed using "-E" postmaster options.
This patch should meet the previous expectations, that after successfull completion of gpstart, distributed transaction can be made.
-
If a client tries to connect while DTM recovery in-progress, you now get an error. Client connections used to wait for dtx recovery process to complete distributed transaction recovery. Depending on situation this waiting could be infinite. To avoid hung connections though in certain situation where dtx recovery can't complete the distributed transactions, better to error out and let client retry in this situation similar to local crash recovery situation.
(Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/dgU9il8l24A/NTAz2cl5BQAJ)
-
PostgreSQL v11 replaced the way the initial catalog contents are stored in the source code. Instead of the "DATA(insert ..." lines in the .h files, the initial contents are stored in JSON format, in separate .dat files. The JSON format is much more convenient to edit, and makes it more convenient to carry diffs against them, so we don't need the GPDB-specific perl scripts to provide defaults for GPDB-added catalog columns anymore, nor the "catullus.pl" script that was used to generate pg_proc.h contents for GPDB-added functions. That's all gone, and all GPDB-specific changes are made to the .dat files directly.
-
PostgreSQL v12 removed support for OIDs in user tables, and changed all catalog tables with OIDs so that the OID is a regular column. A lot of GPDB specific code and tests to deal with OIDs was removed with that.
-
A new Kerberos test suite was added to PostgreSQL v11. We had one in GPDB already, under src/test/gpkerberos. It's not clear how much overlap there is between the two, we could possibly get rid of the GPDB test suite now, but for now, keep them both.
-
Support for passing around MemTuples in TupleTableSlots is gone. MemTuple is still the native format of AO (row) tables, but will be converted to virtual tuple slots. MemTuple is no longer used as the on-wire format in Motions, nor as the on-disk format for temp files when sorting; we now use upstream MinimalTuple for those.
-
HeapTuple->t_tableOid has been resurrected. We had removed it in previous GPDB versions for performance reasons, I don't believe that the overhead is really noticeable, at least not in recent PostgreSQL versions.
-
The pg_rewind regression tests were previously implemented in GPDB as a shell script. It awas done that way when we backported the upstream tests but didn't have the TAP test framework yet. Now we do, so replace the shell implementation with the upstream TAP tests.
-
The way ALTER TABLE commands are dispatched was refactored. Some ALTER TABLE subcommands need to dispatch some extra information from QD to QEs. Previously, the ATExec* functions modified the sub-commands in the QD, and the modified subcommands were dispatched to the QEs. The problem with this approach is that the QEs try to run the ATPrep* phase on the already- prepped commands, so we had to hack the ATPrep* functions to cope with that. That was causing merge conflicts, which could be fixed, but I was not happy with the diff footprint of that approach. Instead, dispatch the AlteredTableInfo structs that were constructed in the QD's prep phase. That way, the execute phase can modifiy the sub-commands and those modifications are dispatched.
-
Upstream isolation tests (src/test/isolation) are now run in utility mode.
Instead of disabling or heavily modifying isolation upstream test better to run them in utility mode (kind of single node postgres) and get the code coverage. Still doesn't test GPDB clustered behavior. Need to enhance the framework for it to understand locking across segments and also work with global deadlock detector.
Getting back the expected file for drop-index-concurrently-1 from upstream. Enabling some of recently disabled test. Didn't look into which other previously disabled test can be enabled. Will do as follow action in some other commit.
This at least gets isolation test green for merge branch now.
All the GPDB-specific code to deal with partitioning has been ripped out and replaced with the new PostgreSQL partitioning code. Compatibility for the old GPDB partitioning syntax was reimplemented on top of the upstream code, in the parse analysis phase.
-
Multi-column list partitioning is no longer supported. As a work-around, you can create a composite type and use that as the partitioning key, like this:
create type foo_partkey as (b int, c int); create table foo (a int, b int, c int) partition by list ((row(b, c)::foo_partkey));
Partition pruning might not work in all the cases that it used to with that work-around though.
-
The gp_enable_exchange_default_partition GUC was removed. With upstream partition code exchanging (essentially attaching new) default partition comes with no risk of wrong data. Attaching a default partition code validates that the data does not violate the partition constraints, so blocking the exchange of default partition under GUC is no longer required. It's same as exchanging any other partition.
-
pg_dump will use the upstream syntax now.
-
PARTITION BY is now accepted on both sides of DISTRIBUTED BY
-
START/END/EVERY is converted in the parser to upstream primitives
-
partition names vs table names. Might differ
-
pg_partition_def() is gone. old catalogs are done. What to use in queries to replace them? pg_partitions no more exists, use pg_partition_tree() instead.
-
Partitioned tables have no access method. They used to be heap or AO or AOCO too, even though they contain no data.
-
Triggers? changed behavior I think. See 'triggers_gp' test.
-
how do templates work now?
-
Partition conditions are no longer represented as CHECK constraints.
-
Moving row to another partition with UPDATE is now supported.
-
error messages are different in many cases
-
Inserting into an intermediate partition is now allowed. The row will be routed to the correct partition, like in PostgreSQL. It used to be forbidden in GPDB.
-
Removed a lot of NOTICEs when partitions are created or dropped.
commit 190a695d929f538299a1459a3f12d70af9e8fbb7 Author: Ashwin Agrawal [email protected] Date: Mon Jul 13 22:17:18 2020 -0700
GRANT and REVOKE recurse to child partitions
GPDB recurses to child partitions for GRANT and REVOKE. Upstream
doesn't but keeping behavior same as previous releases of GPDB here
seems better.
Ideally, ONLY syntax need to be added to GRANT and REVOKE to provide
best of both worlds to users.
Co-authored-by: Alexandra Wang <[email protected]>
commit b7e4d4a710f456b377c4010dee08adf9c0298f09 Author: Ashwin Agrawal [email protected] Date: Mon Jul 13 22:15:27 2020 -0700
Child partition inherits parent permissions
GPDB add partition inherited parent's permission. So, keeping the same
semantic now child tables always inherit parents permission in
partition hierarchy.
Co-authored-by: Alexandra Wang <[email protected]>
commit 9f89426cd939af14cfa038cbe6d711c0b29fbba9 Author: Ashwin Agrawal [email protected] Date: Mon Jul 13 23:55:14 2020 -0700
Partition test EVERY clause related accept behavior
Rounding of EVERY clause argument similar to START/FROM and END/TO
seems fine behavior. So, modify the test and answer file to reflect
this change is behavior and okay to be lenient is this respect.
Co-authored-by: Alexandra Wang <[email protected]>
commit 09c05c05c7b8a8d6a3b9517b704bca974391ce4d Author: Ashwin Agrawal [email protected] Date: Fri May 15 11:11:18 2020 -0700
Unlogged tables are allowed with regular tables in partition hierarchy
Upstream allows to mix unlogged and logged tables in partition
table. Temperory and permanent tables are not allowed to be mixed. So,
lets follow the same restrictions in GPDB.
commit 43883a1bd3026867351fd0651c4c3f8bca45b42d Author: Heikki Linnakangas [email protected] Date: Tue Jun 9 00:52:38 2020 +0300
Fix a few issues with COPY TO/FROM partitioned tables.
1. Allow COPY TO on a partitioned table. It throws an error in upstream,
but GPDB has allowed it. Re-implement it by automatically transforming
the "COPY <table> TO ..." command to "COPY (SELECT * FROM <table>)
TO ...". The hint in the upstream error message suggests doing that,
but we now do it automatically. There is already code to do that
transformation, for COPY TO when Row-Level Security is enabled, so
this is a one line change.
commit b23036066df7d4851855402c0dbed245c1e63f3b Author: Alexandra Wang [email protected] Date: Fri Jun 5 21:03:32 2020 -0700
Support inclusive END range partition boundary
END..INCLUSIVE can be specified for CREATE TABLE, ADD PARTITION and
SPLIT PARTITION commands for range partitions.
For now, this patch only supports partitions keys that are compatible
with the "+1" operator, where "1" is of type Integer. For example, int
and date data types work, timestamp does not work.
During parsing, if the END value is INCLUSIVE, we canonicalize it into
an EXCLUSIVE value. We need to do this because the upperdatums stored in
the catalog are always EXCLUSIVE. This implementation leverages the same
code that supports "EVERY", the only difference is that EVERY operates
"+interval" with the user specified interval, whereas the
canonicalization operates "+1". The "1" is currently hardcoded as an
A_Const of type Integer which is not ideal, A FIXME is added for the
future.
Note that we for now are not planning to support exclusive START, but if
we want to support that in the future, the same canonicalization
function can be used.
- START EXCLUSIVE was re-implementd later
commit be0edb898a2e7653fb42781f21851878f0f97576 Author: Heikki Linnakangas [email protected] Date: Thu Jun 25 19:56:33 2020 +0300
A "point hole" between partitions is now allowed, fix test.
I don't understand why it was not allowed before. But I see no reason
to not allow it now.
:
- FOR (RANK (1)) addressing no longer supported
commit c4ab5cb1d313ecdcd53b74c4e8f650fc6bb56b29 Author: Heikki Linnakangas [email protected] Date: Tue Jun 9 21:15:50 2020 +0300
Adjust output: REINDEX is a no-op on partitioned indexes now.
Partitioned tables have no data. No point in rebuilding partitioned
indexes ever, so we don't.
commit 6d63a669d5248806fe0240c74902e37cf60f8303 Author: Heikki Linnakangas [email protected] Date: Tue Jun 9 21:07:41 2020 +0300
Re-implement REINDEX TABLE on a partitioned table.
It recurses to the partitions now, like it has done in Greenplum
historically.
commit d9bc0394b5b6be1d6f18aed77e1bcd9ce1e7be71 Author: Heikki Linnakangas [email protected] Date: Tue Jun 9 19:16:50 2020 +0300
Remove uninteresting tests that used REINDEX on partitioned table.
REINDEX on a partitioned table is no longer supported. Even on master, it
didn't do anything interesting, as it didn't recurse to reindex the
corresponding indexes on partitions, and the root table contains no data.
Hence, I don't think these tests ever made much sense. Also, the tests
create a gist index, but the test queries won't use the index.
commit f31ceca3f5eb17740ae4314621d725a815fe9371 Author: Ashwin Agrawal [email protected] Date: Tue Jun 9 11:38:22 2020 -0700
Store sub-partition template in catalog
New catalog table gp_partition_template now stores the sub-partition
template, if specified during CREATE TABLE. This table directly stores
the top parent's relid, partition level and serialized version of
`GpPartitionDefinition` node corresponding to template for each level
of partition specified. That's easiest and most convenient, given it
includes all the information needed for template like encoding
clauses, with options, tablespace,.... CREATE TABLE is only writer of
this template table.
ADD PARTITION is only reader for this template table and uses it to
create new partitions.
Entry from gp_partition_template is removed when partition table is
deleted.
commit 5acf27055ba0833565e74508657a45b984412c50 Author: Ashwin Agrawal [email protected] Date: Mon Jun 22 17:52:00 2020 -0700
Modify alter_table upstream test
In GPDB ALTER TABLE RENAME cascades down to child partition, hence
comment out the rename of child partition from this upstream test. As
it fails to find the child table with original name.
commit 03bbd11043f4774afb30f60ab5f10eb6b7c34aa9 Author: Heikki Linnakangas [email protected] Date: Wed Jun 3 17:30:43 2020 +0300
Privileges on partitions work slightly differently now, fix test.
NOTE: This would be good to mention in the docs. GRANT/REVOKE on a
partitioned table doesn't recurse anymore, so GRANTing privilege on the
"root partition" doesn't allow you to query individual partitions by
name anymore.
or did we hack that later?
GPDB-specific Partition Selector node are still created, for partition pruning on joins, but the planner code to create them was largely rewritten, to leverage the upstream partition pruning code as much as possible. The GPORCA code to create Partition Selectors was also rewritten.
- In executor, Partition Selectors now use the upstream partition pruning
infrastructure to compute which partitions can contain tuples that pass
the join qual. The partitions are recorded in a Bitmapset, like the
upstream partition pruning code does, and the Bitmapset is passed from
the PartitionSelector to the Append node using a special executor Param.
The old hash table that contained the partition OIDs is gone, as is the
PartSelected expression. The Append node now directly skips the pruned
partitions
- In planner, use the upstream code to construct the PartitionPruneInfo
steps to do the pruning. Thanks to this, join pruning now also works
for quals with <, > operators, and for tables partitioned on expressions
rather than plain columns.
- In planner, refactor where the decision to do join pruning is made. It's
now decided in create_append_plan(), where run-time pruning is decided,
too. To pass the partition pruning info to the Partition Selector above
the Append, maintain a stack of joins that are candidates for Partition
Selectors as we recurse in create_plan().
-
Developers should refrain from using RelationIsAppendOptimized() and friends and use the newly introduced table AM API as much as possible.
-
pg_class.relstorage is gone
-
CLUSTER command works on AO tables, thanks to Andrey Borodin (@x4m) for submitting a patch against master and also helping with merging it in iteration branch. One known issue is pertaining to MVCC: #10262, it has been noted in the documentation.
-
Table inheritance is now allowed for AOCO tables. It was previously disallowed for no discernible reason, and it seems to just work now.
-
New names for AO access methods are: AO_ROW for row-oriented table and AO_COLUMN for column-oriented table. The new names need to be used in the new CREATE TABLE syntax, in the USING clause. The use of WITH clause to specify AO table orientation will be deprecated.
-
Notable GUC changes: default_table_access_method GUC can be set to AO_ROW or AO_COLUMN to create corresponding tables when CREATE TABLE command is run without specifying the USING clause. The "orientation" keyword is no longer accepted as a value of gp_default_storage_options GUC.
-
[Asim] I don't think the following commit should be alluded to in the merge commit message. The reason is compresstype and compresslevel options can be specified in the WITH clause. commit 866e6a9cd35a8e5bfb8340015f4fa1d5d07074b6 Author: Heikki Linnakangas [email protected] Date: Tue Jun 23 20:40:22 2020 +0300
Don't try to use compresslevel/compresstype options on AO row table.
You get an error now. We used to accept it, but no more. I'm not sure if we should accept it or not, but that's not the point of this ANALYZE test.
-
[Asim] The following commit can also be skipped in final commit message. The size calculation logic remains the same, only the implementation differs due to table AM API. commit 220fd1aac2f1dd70ab7254f9eeb85bf54cc248fb Author: Georgios Kokolatos [email protected] Date: Wed Jun 17 14:35:11 2020 +0000
Use tableam api for relation size on AO/AOCO and general cleanup
This commit implements aoco_relation_size() and tries to use it where possible. The tableam api interface does not give any option to switch over compressed or uncompressed size. A study of the call sites revieled that the compressed site is always desired, so that is what is implemented.
The logic is moved over from GetAOCSTotalBytes and is the equivelant for AO relations, which copied GetAOTotalBytes. These two functions then became obsolete and removed in favour of the tablam methods.
Along with it, RelationGuessNumberOfBlocks() has been removed in favour of RelationGetNumberOfBlocks() from upstream. The upstream version recognised that custom access methods might not use BLCKSZ wide fixed size block and implements an approximation that was almost exact to Greenplum's.
Further cleanups on that front to follow.
-
ANALYZE?
-
Table AM interface does not provide a way to maintain state needed by insert/update/delete operations on AO tables. Greenplum specific logic was implemented to accommodate the state, leading to some diff with upstream. The plan is to propose enhancements to the table AM API in upstream so as to minimize this diff in future.
What github issues were perhaps fixed? I think there were some related to grouping sets maybe?
random_page_cost was changed to 4 in master too. There are some tests that set it explicitly. Remove the unnecessary settings.
For any questions, please ask at [email protected].