PostgreSQL Weekly News - May 23, 2021

Posted on 2021-05-24 by PWN

PostgreSQL Weekly News - May 23, 2021

PostgreSQL 14 Beta 1 released. Test!

The official IRC channels have moved from Freenode to Libera. Details here.

PostgreSQL Product News

DBD::Pg 3.15.0, a Perl driver for PostgreSQL, released.

pg_back 2.0.1, is a tool that can dump PostgreSQL databases to files, released.

PostgreSQL Jobs for May

PostgreSQL in the News

Planet PostgreSQL:

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Bruce Momjian pushed:

Etsuro Fujita pushed:

Magnus Hagander pushed:

Peter Eisentraut pushed:

Tom Lane pushed:

  • Stamp 14beta1.

  • Avoid creating testtablespace directories where not wanted. Recently we refactored things so that pg_regress makes the "testtablespace" subdirectory used by the core regression tests, instead of doing that in the makefiles. That had the undesirable side effect of making such a subdirectory in every directory that has "input" or "output" test files. Since these subdirectories remain empty, git doesn't complain about them, but nonetheless they're clutter. To fix, invent an explicit --make-testtablespace-dir switch, so that pg_regress only makes the subdirectory when explicitly told to. Discussion:

  • Clean up cpluspluscheck violation. "typename" is a C++ keyword, so pg_upgrade.h fails to compile in C++. Fortunately, there seems no likely reason for somebody to need to do that. Nonetheless, it's project policy that all .h files should pass cpluspluscheck, so rename the argument to fix that. Oversight in 57c081de0; back-patch as that was. (The policy requiring pg_upgrade.h to pass cpluspluscheck only goes back to v12, but it seems best to keep this code looking the same in all branches.)

  • Avoid detoasting failure after COMMIT inside a plpgsql FOR loop. exec_for_query() normally tries to prefetch a few rows at a time from the query being iterated over, so as to reduce executor entry/exit overhead. Unfortunately this is unsafe if we have COMMIT or ROLLBACK within the loop, because there might be TOAST references in the data that we prefetched but haven't yet examined. Immediately after the COMMIT/ROLLBACK, we have no snapshots in the session, meaning that VACUUM is at liberty to remove recently-deleted TOAST rows. This was originally reported as a case triggering the "no known snapshots" error in init_toast_snapshot(), but even if you miss hitting that, you can get "missing toast chunk", as illustrated by the added isolation test case. To fix, just disable prefetching in non-atomic contexts. Maybe there will be performance complaints prompting us to work harder later, but it's not clear at the moment that this really costs much, and I doubt we'd want to back-patch any complicated fix. In passing, adjust that error message in init_toast_snapshot() to be a little clearer about the likely cause of the problem. Patch by me, based on earlier investigation by Konstantin Knizhnik. Per bug #15990 from Andreas Wicht. Back-patch to v11 where intra-procedure COMMIT was added. Discussion:

  • Restore the portal-level snapshot after procedure COMMIT/ROLLBACK. COMMIT/ROLLBACK necessarily destroys all snapshots within the session. The original implementation of intra-procedure transactions just cavalierly did that, ignoring the fact that this left us executing in a rather different environment than normal. In particular, it turns out that handling of toasted datums depends rather critically on there being an outer ActiveSnapshot: otherwise, when SPI or the core executor pop whatever snapshot they used and return, it's unsafe to dereference any toasted datums that may appear in the query result. It's possible to demonstrate "no known snapshots" and "missing chunk number N for toast value" errors as a result of this oversight. Historically this outer snapshot has been held by the Portal code, and that seems like a good plan to preserve. So add infrastructure to pquery.c to allow re-establishing the Portal-owned snapshot if it's not there anymore, and add enough bookkeeping support that we can tell whether it is or not. We can't, however, just re-establish the Portal snapshot as part of COMMIT/ROLLBACK. As in normal transaction start, acquiring the first snapshot should wait until after SET and LOCK commands. Hence, teach spi.c about doing this at the right time. (Note that this patch doesn't fix the problem for any PLs that try to run intra-procedure transactions without using SPI to execute SQL commands.) This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD, rename that to allow_nonatomic. replication/logical/worker.c also needs some fixes, because it wasn't careful to hold a snapshot open around AFTER trigger execution. That code doesn't use a Portal, which I suspect someday we're gonna have to fix. But for now, just rearrange the order of operations. This includes back-patching the recent addition of finish_estate() to centralize the cleanup logic there. This also back-patches commit 2ecfeda3e into v13, to improve the test coverage for worker.c (it was that test that exposed that worker.c's snapshot management is wrong). Per bug #15990 from Andreas Wicht. Back-patch to v11 where intra-procedure COMMIT was added. Discussion:

  • Fix usage of "tableoid" in GENERATED expressions. We consider this supported (though I've got my doubts that it's a good idea, because tableoid is not immutable). However, several code paths failed to fill the field in soon enough, causing such a GENERATED expression to see zero or the wrong value. This occurred when ALTER TABLE adds a new GENERATED column to a table with existing rows, and during regular INSERT or UPDATE on a foreign table with GENERATED columns. Noted during investigation of a report from Vitaly Ustinov. Back-patch to v12 where GENERATED came in. Discussion:

  • Disallow whole-row variables in GENERATED expressions. This was previously allowed, but I think that was just an oversight. It's a clear violation of the rule that a generated column cannot depend on itself or other generated columns. Moreover, because the code was relying on the assumption that no such cross-references exist, it was pretty easy to crash ALTER TABLE and perhaps other places. Even if you managed not to crash, you got quite unstable, implementation-dependent results. Per report from Vitaly Ustinov. Back-patch to v12 where GENERATED came in. Discussion:

  • Remove plpgsql's special-case code paths for SET/RESET. In the wake of 84f5c2908, it's no longer necessary for plpgsql to handle SET/RESET specially. The point of that was just to avoid taking a new transaction snapshot prematurely, which the regular code path through _SPI_execute_plan() now does just fine (in fact better, since it now does the right thing for LOCK too). Hence, rip out a few lines of code, going back to the old way of treating SET/RESET as a generic SQL command. This essentially reverts all but the test cases from b981275b6. Discussion:

  • Fix access to no-longer-open relcache entry in logical-rep worker. If we redirected a replicated tuple operation into a partition child table, and then tried to fire AFTER triggers for that event, the relation cache entry for the child table was already closed. This has no visible ill effects as long as the entry is still there and still valid, but an unluckily-timed cache flush could result in a crash or other misbehavior. To fix, postpone the ExecCleanupTupleRouting call (which is what closes the child table) until after we've fired triggers. This requires a bit of refactoring so that the cleanup function can have access to the necessary state. In HEAD, I took the opportunity to simplify some of worker.c's function APIs based on use of the new ApplyExecutionData struct. However, it doesn't seem safe/practical to back-patch that aspect, at least not without a lot of analysis of possible interactions with a04daa97a. In passing, add an Assert to afterTriggerInvokeEvents to catch such cases. This seems worthwhile because we've grown a number of fairly unstructured ways of calling AfterTriggerEndQuery. Back-patch to v13, where worker.c grew the ability to deal with partitioned target tables. Discussion:

  • Be more verbose when the postmaster unexpectedly quits. Emit a LOG message when the postmaster stops because of a failure in the startup process. There already is a similar message if we exit for that reason during PM_STARTUP phase, so it seems inconsistent that there was none if the startup process fails later on. Also emit a LOG message when the postmaster stops after a crash because restart_after_crash is disabled. This seems potentially helpful in case DBAs (or developers) forget that that's set. Also, it was the only remaining place where the postmaster would do an abnormal exit without any comment as to why. In passing, remove an unreachable call of ExitPostmaster(0). Discussion:

  • Re-order pg_attribute columns to eliminate some padding space. Now that attcompression is just a char, there's a lot of wasted padding space after it. Move it into the group of char-wide columns to save a net of 4 bytes per pg_attribute entry. While we're at it, swap the order of attstorage and attalign to make for a more logical grouping of these columns. Also re-order actions in related code to match the new field ordering. This patch also fixes one outright bug: equalTupleDescs() failed to compare attcompression. That could, for example, cause relcache reload to fail to adopt a new value following a change. Michael Paquier and Tom Lane, per a gripe from Andres Freund. Discussion:

David Rowley pushed:

  • Fix typo and outdated information in README.barrier. README.barrier didn't seem to get the memo when atomics were added. Fix that. Author: Tatsuo Ishii, David Rowley Discussion: Backpatch-through: 9.6, oldest supported release

  • Fix planner's use of Result Cache with unique joins. When the planner considered using a Result Cache node to cache results from the inner side of a Nested Loop Join, it failed to consider that the inner path's parameterization may not be the entire join condition. If the join was marked as inner_unique then we may accidentally put the cache in singlerow mode. This meant that entries would be marked as complete after caching the first row. That was wrong as if only part of the join condition was parameterized then the uniqueness of the unique join was not guaranteed at the Result Cache's level. The uniqueness is only guaranteed after Nested Loop applies the join filter. If subsequent rows were found, this would lead to: ERROR: cache entry already complete This could have been fixed by only putting the cache in singlerow mode if the entire join condition was parameterized. However, Nested Loop will only read its inner side so far as the first matching row when the join is unique, so that might mean we never get an opportunity to mark cache entries as complete. Since non-complete cache entries are useless for subsequent lookups, we just don't bother considering a Result Cache path in this case. In passing, remove the XXX comment that claimed the above ERROR might be better suited to be an Assert. After there being an actual case which triggered it, it seems better to keep it an ERROR. Reported-by: David Christensen Discussion:

Michaël Paquier pushed:

Fujii Masao pushed:

  • Fix issues in pg_stat_wal. 1) Previously there were both pgstat_send_wal() and pgstat_report_wal() in order to send WAL activity to the stats collector. With the former being used by wal writer, the latter by most other processes. They were a bit redundant and so this commit merges them into pgstat_send_wal() to simplify the code. 2) Previously WAL global statistics counters were calculated and then compared with zero-filled buffer in order to determine whether any WAL activity has happened since the last submission. These calculation and comparison were not cheap. This was regularly exercised even in read-only workloads. This commit fixes the issue by making some WAL activity counters directly be checked to determine if there's WAL activity stats to send. 3) Previously pgstat_report_stat() did not check if there's WAL activity stats to send as part of the "Don't expend a clock check if nothing to do" check at the top. It's probably rare to have pending WAL stats without also passing one of the other conditions, but for safely this commit changes pgstat_report_stats() so that it checks also some WAL activity counters at the top. This commit also adds the comments about the design of WAL stats. Reported-by: Andres Freund Author: Masahiro Ikeda Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi, Andres Freund, Fujii Masao Discussion:

  • Make standby promotion reset the recovery pause state to 'not paused'. If a promotion is triggered while recovery is paused, the paused state ends and promotion continues. But previously in that case pg_get_wal_replay_pause_state() returned 'paused' wrongly while a promotion was ongoing. This commit changes a standby promotion so that it marks the recovery pause state as 'not paused' when it's triggered, to fix the issue. Author: Fujii Masao Reviewed-by: Dilip Kumar, Kyotaro Horiguchi Discussion:

Amit Kapila pushed:

  • Fix test. We were not waiting for a publisher to catch up with the subscriber after creating a subscription. Now, it can happen that apply worker starts replication even after we have disabled the subscription in the test. This will make the test expect that there is no active slot whereas there exists one. Fix this symptom by allowing the publisher to wait for catching up with the subscription. It is not a good idea to ensure if the slot is still active by checking for walsender existence as we release the slot after we clean up the walsender related memory. Fix that by checking the slot status in pg_replication_slots. Also, it is better to avoid repeated enabling/disabling of the subscription. Finally, we make autovacuum off for this test to avoid any empty transaction appearing in the test while consuming changes. Reported-by: as per buildfarm Author: Vignesh C Reviewed-by: Amit Kapila, Michael Paquier Discussion:

  • Fix deadlock for multiple replicating truncates of the same table. While applying the truncate change, the logical apply worker acquires RowExclusiveLock on the relation being truncated. This allowed truncate on the relation at a time by two apply workers which lead to a deadlock. The reason was that one of the workers after updating the pg_class tuple tries to acquire SHARE lock on the relation and started to wait for the second worker which has acquired RowExclusiveLock on the relation. And when the second worker tries to update the pg_class tuple, it starts to wait for the first worker which leads to a deadlock. Fix it by acquiring AccessExclusiveLock on the relation before applying the truncate change as we do for normal truncate operation. Author: Peter Smith, test case by Haiying Tang Reviewed-by: Dilip Kumar, Amit Kapila Backpatch-through: 11 Discussion:

Dean Rasheed pushed:

Andrew Dunstan pushed:

Pending Patches

Yugo Nagata sent in another revision of a patch to implement incrementally materialized views.

Amul Sul sent in another revision of a patch to separate the WAL writing code from StartupXLOG(), implement WAL prohibit state using global barriers, error or Assert before START_CRIT_SECTION for WAL write, and document all this. This is infrastructure for, among other things, ALTER SYSTEM READ {ONLY | WRITE}.

Pavel Stěhule sent in another revision of a patch to implement schema variables.

Bharath Rupireddy sent in another revision of a patch to avoid catalog accesses in slot_store_error_callback and conversion_error_callback.

Amit Langote sent in a patch to reword some comments in pathnodes.h for clarity.

Ranier Vilela sent in another revision of a patch to fix a possible memory corruption in zic.

Bharath Rupireddy sent in three revisions of a patch to tighten up batch_size and fetch_size options against non-numeric values in the PostgreSQL FDW.

Masahiro Ikeda sent in two more revisions of a patch to improve the performance of reporting WAL stats without introducing a new variable.

Hou Zhijie and Amit Langote traded patches to skip partition tuple routing when there is a constant partition key.

Peter Smith and Ajin Cherian traded patches to add support for prepared transactions to built-in logical replication, add prepare API support for streaming transactions, and skip empty transactions for logical replication.

Amit Langote sent in four more revisions of a patch to pgoutput to fix memory management of by releasing memory allocated when creating the tuple-conversion map and its component TupleDescs when its owning sync entry is invalidated and freeing TupleDescs when no map is deemed necessary to begin with.

Nitin Jadhav sent in two more revisions of a patch to remove an extra malloc from create_list_bounds(), allocate the PartitionListValue as a single chunk, do the same in create_hash_bounds for PartitionHashBound, allocate datum arrays in bulk to avoid palloc overhead, and pfree intermediate results in create_range_bounds().

Bertrand Drouvot sent in another revision of a patch to keep oldestxid in pgupgrade.

Andrew Dunstan sent in another revision of a patch to implement SQL/JSON functions.

Andrew Dunstan sent in another revision of a patch to implement SQL/JSON JSON_TABLE.

Matthias van de Meent sent in another revision of a patch to improve the usage of line pointer array truncation in heapam.

Heikki Linnakangas sent in a patch to allow specifying pg_waldump --rmgr option multiple times.

Robert Haas, Dilip Kumar, and Kyotaro HORIGUCHI traded patches intended to fix a bug that manifested as a race condition in recovery.

Takashi Menjo sent in another revision of a patch to map WAL segment files on PMEM as WAL buffers.

Justin Pryzby sent in another revision of a patch to implement different compression methods for FPI.

Takamichi Osumi sent in a patch to disallow TRUNCATE on user_catalog_table.

Peter Eisentraut and Álvaro Herrera traded patches to add a NO_INSTALL option to pgxs.

Bharath Rupireddy sent in three more revisions of a patch to disambiguate error messages that use "non-negative".

Daniel Gustafsson sent in two revisions of a patch to extend configure_test_server_for_ssl to add extensions, and add tests for sslinfo.

Mathis Rudolf sent in a patch intended to fix a bug that manifested as an alias collision in REFRESH MATERIALIZED VIEW CONCURRENTLY by adding adds the prefix _pg_internal_ to aliases like 'mv' and 'newdata' in 'refresh_by_match_merge()', which makes it unlikely to cause any collisions with user-created MVs.

Yura Sokolov sent in a patch to add a PortalDrop call to exec_execute_message().

Bharath Rupireddy and Peter Smith traded patches to refactor "mutually exclusive options" error reporting code in parse_subscription_options.

Michaël Paquier sent in another revision of a patch to switch tests of pg_upgrade to use TAP.

Greg Nancarrow sent in another revision of a patch to fix a parallel worker failed assertion and coredump.

Kirill Reshke sent in a patch intended to fix a bug that manifested as slow standby snapshot by using a doubly linked list in KnownAssignedXids.

Paul Guo sent in a patch to fix a pg_rewind failure due to read only file open() error by making it writable.

Alexander Pyhalov sent in a patch to make it possible to push joins with function RTEs to PostgreSQL data sources.

Nitin Jadhav sent in another revision of a patch to support tzh tzm patterns.

Michaël Paquier sent in a patch to force disable of SSL renegotiation in the server.

Ivan Panchenko sent in another revision of a patch to make it possible to trigger actions on login.

Takayuki Tsunakawa sent in another revision of a patch to propagate CTE property flags in the rewriter.

Ashutosh Bapat sent in two revisions of a patch to report new catalog_xmin candidate in LogicalIncreaseXminForSlot().

Michaël Paquier sen in another revision of a patch to add authenticated data to pg_stat_activity.

Bharath Rupireddy sent in another revision of a patch to reword error messages and docs for parallel vacuum.

Hou Zhijie sent in two revisions of a patch intended to fix a bug that manifested as caused FDW batched inserts to fail when batch_size > 65535.

Dmitry Dolgov sent in another revision of a patch to implement index skip scans.

Tomáš Vondra sent in a patch intended to fix a bug that manifested as performance degradation of REFRESH MATERIALIZED VIEW.

Michaël Paquier and Tom Lane traded patches to reduce the memory footprint of the pg_attribute struct.

David Rowley sent in another revision of a patch to speed up NOT IN() with a set of Consts.

Vigneshwaran C sent in another revision of a patch to add tab completion for missing options in PUBLICATION and SUBSCRIPTION commands.