PostgreSQL Weekly News - June 13, 2021

Posted on 2021-06-14 by PWN

PostgreSQL Weekly News - June 13, 2021

Person of the week

PostgreSQL Product News

pg_statement_rollback v1.2, an extension that adds server side transaction with rollback at statement level, released.

PostgreSQL Jobs for June

PostgreSQL in the News

Planet PostgreSQL:

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Tomáš Vondra pushed:

Tom Lane pushed:

  • Fix inconsistent equalfuncs.c behavior for FuncCall.funcformat. Other equalfuncs.c checks on CoercionForm fields use COMPARE_COERCIONFORM_FIELD (which makes them no-ops), but commit 40c24bfef neglected to make _equalFuncCall do likewise. Fix that. This is only strictly correct if FuncCall.funcformat has no semantic effect, instead just determining ruleutils.c display formatting. 40c24bfef added a couple of checks in parse analysis that could break that rule; but on closer inspection, they're redundant, so just take them out again. Per report from Noah Misch. Discussion:

  • Fix incautious handling of possibly-miscoded strings in client code. An incorrectly-encoded multibyte character near the end of a string could cause various processing loops to run past the string's terminating NUL, with results ranging from no detectable issue to a program crash, depending on what happens to be in the following memory. This isn't an issue in the server, because we take care to verify the encoding of strings before doing any interesting processing on them. However, that lack of care leaked into client-side code which shouldn't assume that anyone has validated the encoding of its input. Although this is certainly a bug worth fixing, the PG security team elected not to regard it as a security issue, primarily because any untrusted text should be sanitized by PQescapeLiteral or the like before being incorporated into a SQL or psql command. (If an app fails to do so, the same technique can be used to cause SQL injection, with probably much more dire consequences than a mere client-program crash.) Those functions were already made proof against this class of problem, cf CVE-2006-2313. To fix, invent PQmblenBounded() which is like PQmblen() except it won't return more than the number of bytes remaining in the string. In HEAD we can make this a new libpq function, as PQmblen() is. It seems imprudent to change libpq's API in stable branches though, so in the back branches define PQmblenBounded as a macro in the files that need it. (Note that just changing PQmblen's behavior would not be a good idea; notably, it would completely break the escaping functions' defense against this exact problem. So we just want a version for those callers that don't have any better way of handling this issue.) Per private report from houjingyi. Back-patch to all supported branches.

  • Stabilize contrib/seg regression test. If autovacuum comes along just after we fill table test_seg with some data, it will update the stats to the point where we prefer a plain indexscan over a bitmap scan, breaking the expected output (as well as the point of the test case). To fix, just force a bitmap scan to be chosen here. This has evidently been wrong since commit de1d042f5. It's not clear why we just recently saw any buildfarm failures due to it; but prairiedog has failed twice on this test in the past week. Hence, backpatch to v11 where this test case came in.

  • Don't crash on empty statements in SQL-standard function bodies. gram.y should discard NULL pointers (empty statements) when assembling a routine_body_stmt_list, as it does for other sorts of statement lists. Julien Rouhaud and Tom Lane, per report from Noah Misch. Discussion:

  • Avoid misbehavior when persisting a non-stable cursor. PersistHoldablePortal has long assumed that it should store the entire output of the query-to-be-persisted, which requires rewinding and re-reading the output. This is problematic if the query is not stable: we might get different row contents, or even a different number of rows, which'd confuse the cursor state mightily. In the case where the cursor is NO SCROLL, this is very easy to solve: just store the remaining query output, without any rewinding, and tweak the portal's cursor state to match. Aside from removing the semantic problem, this could be significantly more efficient than storing the whole output. If the cursor is scrollable, there's not much we can do, but it was already the case that scrolling a volatile query's result was pretty unsafe. We can just document more clearly that getting correct results from that is not guaranteed. There are already prohibitions in place on using SCROLL with FOR UPDATE/SHARE, which is one way for a SELECT query to have non-stable results. We could imagine prohibiting SCROLL when the query contains volatile functions, but that would be expensive to enforce. Moreover, it could break applications that work just fine, if they have functions that are in fact stable but the user neglected to mark them so. So settle for documenting the hazard. While this problem has existed in some guise for a long time, it got a lot worse in v11, which introduced the possibility of persisting plpgsql cursors (perhaps implicit ones) even when they violate the rules for what can be marked WITH HOLD. Hence, I've chosen to back-patch to v11 but not further. Per bug #17050 from Алексей Булгаков. Discussion:

  • Force NO SCROLL for plpgsql's implicit cursors. Further thought about bug #17050 suggests that it's a good idea to use CURSOR_OPT_NO_SCROLL for the implicit cursor opened by a plpgsql FOR-over-query loop. This ensures that, if somebody commits inside the loop, PersistHoldablePortal won't try to rewind and re-read the cursor. While we'd have selected NO_SCROLL anyway if FOR UPDATE/SHARE appears in the query, there are other hazards with volatile functions; and in any case, it's silly to expend effort storing rows that we know for certain won't be needed. (While here, improve the comment in exec_run_select, which was a bit confused about the rationale for when we can use parallel mode. Cursor operations aren't a hazard for nameless portals.) This wasn't an issue until v11, which introduced the possibility of persisting such cursors. Hence, back-patch to v11. Per bug #17050 from Алексей Булгаков. Discussion:

  • Avoid ECPG test failures in some GSS-capable environments. Buildfarm member hamerkop has been reporting that two cases in connect/test5.pgc show different error messages than the test expects, because since commit ffa2e4670 libpq's connection failure messages are exposing the fact that a GSS-encrypted connection was attempted and failed. That's pretty interesting information in itself, and I certainly don't wish to shoot the messenger, but we need to do something to stabilize the ECPG results. For the second of these two failure cases, we can add the gssencmode=disable option to prevent the discrepancy. However, that solution is problematic for the first failure, because the only unique thing about that case is that it's testing a completely-omitted connection target; there's noplace to add the option without defeating the point of the test case. After some thrashing around with alternative fixes that turned out to have undesirable side-effects, the most workable answer is just to give up and remove that test case. Perhaps we can revert this later, if we figure out why the GSS code is misbehaving in hamerkop's environment. Thanks to Michael Paquier for exploration of alternatives. Discussion:

  • Shut down EvalPlanQual machinery when LockRows node reaches the end. Previously, we left the EPQ sub-executor alone until ExecEndLockRows. This caused any buffer pins or other resources that it might hold to remain held until ExecutorEnd, which in some code paths means that they are held till the Portal is closed. That can cause user-visible problems, such as blocking VACUUM; and it's unlike the behavior of ordinary table-scanning nodes, which will have released all buffer pins by the time they return an EOF indication. We can make LockRows work more like other plan nodes by calling EvalPlanQualEnd just before returning NULL. We still need to call it in ExecEndLockRows in case the node was not run to completion, but in the normal case the second call does nothing and costs little. Per report from Yura Sokolov. In principle this is a longstanding bug, but in view of the lack of other complaints and the low severity of the consequences, I chose not to back-patch. Discussion:

  • Rearrange logrep worker's snapshot handling some more. It turns out that worker.c's code path for TRUNCATE was also careless about establishing a snapshot while executing user-defined code, allowing the checks added by commit 84f5c2908 to fail when a trigger is fired in that context. We could just wrap Push/PopActiveSnapshot around the truncate call, but it seems better to establish a policy of holding a snapshot throughout execution of a replication step. To help with that and possible future requirements, replace the previous ensure_transaction calls with pairs of begin/end_replication_step calls. Per report from Mark Dilger. Back-patch to v11, like the previous changes. Discussion:

  • Reconsider the handling of procedure OUT parameters. Commit 2453ea142 redefined pg_proc.proargtypes to include the types of OUT parameters, for procedures only. While that had some advantages for implementing the SQL-spec behavior of DROP PROCEDURE, it was pretty disastrous from a number of other perspectives. Notably, since the primary key of pg_proc is name + proargtypes, this made it possible to have multiple procedures with identical names + input arguments and differing output argument types. That would make it impossible to call any one of the procedures by writing just NULL (or "?", or any other data-type-free notation) for the output argument(s). The change also seems likely to cause grave confusion for client applications that examine pg_proc and expect the traditional definition of proargtypes. Hence, revert the definition of proargtypes to what it was, and undo a number of complications that had been added to support that. To support the SQL-spec behavior of DROP PROCEDURE, when there are no argmode markers in the command's parameter list, we perform the lookup both ways (that is, matching against both proargtypes and proallargtypes), succeeding if we get just one unique match. In principle this could result in ambiguous-function failures that would not happen when using only one of the two rules. However, overloading of procedure names is thought to be a pretty rare usage, so this shouldn't cause many problems in practice. Postgres-specific code such as pg_dump can defend against any possibility of such failures by being careful to specify argmodes for all procedure arguments. This also fixes a few other bugs in the area of CALL statements with named parameters, and improves the documentation a little. catversion bump forced because the representation of procedures with OUT arguments changes. Discussion:

  • Fix multiple crasher bugs in partitioned-table replication logic. apply_handle_tuple_routing(), having detected and reported that the tuple it needed to update didn't exist, tried to update that tuple anyway, leading to a null-pointer dereference. logicalrep_partition_open() failed to ensure that the LogicalRepPartMapEntry it built for a partition was fully independent of that for the partition root, leading to trouble if the root entry was later freed or rebuilt. Meanwhile, on the publisher's side, pgoutput_change() sometimes attempted to apply execute_attr_map_tuple() to a NULL tuple. The first of these was reported by Sergey Bernikov in bug #17055; I found the other two while developing some test cases for this sadly under-tested code. Diagnosis and patch for the first issue by Amit Langote; patches for the others by me; new test cases by me. Back-patch to v13 where this logic came in. Discussion:

  • Don't use Asserts to check for violations of replication protocol. Using an Assert to check the validity of incoming messages is an extremely poor decision. In a debug build, it should not be that easy for a broken or malicious remote client to crash the logrep worker. The consequences could be even worse in non-debug builds, which will fail to make such checks at all, leading to who-knows-what misbehavior. Hence, promote every Assert that could possibly be triggered by wrong or out-of-order replication messages to a full test-and-ereport. To avoid bloating the set of messages the translation team has to cope with, establish a policy that replication protocol violation error reports don't need to be translated. Hence, all the new messages here use errmsg_internal(). A couple of old messages are changed likewise for consistency. Along the way, fix some non-idiomatic or outright wrong uses of hash_search(). Most of these mistakes are new with the "streaming replication" patch (commit 464824323), but a couple go back a long way. Back-patch as appropriate. Discussion:

  • Ensure pg_filenode_relation(0, 0) returns NULL. Previously, a zero value for the relfilenode resulted in a confusing error message about "unexpected duplicate". This function returns NULL for other invalid relfilenode values, so zero should be treated likewise. It's been like this all along, so back-patch to all supported branches. Justin Pryzby Discussion:

  • Restore robustness of TAP tests that wait for postmaster restart. Several TAP tests use poll_query_until() to wait for the postmaster to restart. They were checking to see if a trivial query (e.g. "SELECT 1") succeeds. However, that's problematic in the wake of commit 11e9caff8, because now that we feed said query to psql via stdin, we risk IPC::Run whining about a SIGPIPE failure if psql quits before reading the query. Hence, we can't use a nonempty query in cases where we need to wait for connection failures to stop happening. Per the precedent of commits c757a3da0 and 6d41dd045, we can pass "undef" as the query in such cases to ensure that IPC::Run has nothing to write. However, then we have to say that the expected output is empty, and this exposes a deficiency in poll_query_until: if psql fails altogether and returns empty stdout, poll_query_until will treat that as a success! That's because, contrary to its documentation, it makes no actual check for psql failure, looking neither at the exit status nor at stderr. To fix that, adjust poll_query_until to insist on empty stderr as well as a stdout match. (I experimented with checking exit status instead, but it seems that psql often does exit(1) in cases that we need to consider successes. That might be something to fix someday, but it would be a non-back-patchable behavior change.) Back-patch to v10. The test cases needing this exist only as far back as v11, but it seems wise to keep poll_query_until's behavior the same in v10, in case we back-patch another such test case in future. (9.6 does not currently need this change, because in that branch poll_query_until can't be told to accept empty stdout as a success case.) Per assorted buildfarm failures, mostly on hoverfly. Discussion:

Etsuro Fujita pushed:

Amit Kapila pushed:

Michaël Paquier pushed:

Peter Eisentraut pushed:

Bruce Momjian pushed:

Robert Haas pushed:

  • Fix corner case failure of new standby to follow new primary. This only happens if (1) the new standby has no WAL available locally, (2) the new standby is starting from the old timeline, (3) the promotion happened in the WAL segment from which the new standby is starting, (4) the timeline history file for the new timeline is available from the archive but the WAL files for are not (i.e. this is a race), (5) the WAL files for the new timeline are available via streaming, and (6) recovery_target_timeline='latest'. Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this logic and was an improvement over the previous code, but it mishandled this case. If recovery_target_timeline='latest' and restore_command is set, validateRecoveryParameters() can change recoveryTargetTLI to be different from receiveTLI. If streaming is then tried afterward, expectedTLEs gets initialized with the history of the wrong timeline. It's supposed to be a list of entries explaining how to get to the target timeline, but in this case it ends up with a list of entries explaining how to get to the new standby's original timeline, which isn't right. Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi. Discussion:

  • Adjust new test case to set wal_keep_size. Per buildfarm member conchuela and Kyotaro Horiguchi, it's possible for the WAL segment that the cascading standby needs to be removed too quickly. Hopefully this will prevent that. Kyotaro Horiguchi Discussion:

David Rowley pushed:

Noah Misch pushed:

Álvaro Herrera pushed:

Andres Freund pushed:

Andrew Dunstan pushed:

  • Fix new recovery test for use under msys. Commit caba8f0d43 wasn't quite right for msys, as demonstrated by several buildfarm animals, including jacana and fairywren. We need to use the msys perl in the archive command, but call it in such a way that Windows will understand the path. Furthermore, inside the copy script we need to convert a Windows path to an msys path.

  • Further tweaks to stuck_on_old_timeline recovery test. Translate path slashes on target directory path. This was confusing old branches, but is applied to all branches for the sake of uniformity. Perl is perfectly able to understand paths with forward slashes. Along the way, restore the previous archive_wait query, for the sake of uniformity with other tests, per gripe from Tom Lane.

Pending Patches

Ranier Vilela sent in another revision of a patch to reduce the overhead on detoasted values.

Takamichi Osumi sent in three more revisions of a patch to document a deadlock that can arise in synchronous logical decoding.

Andrey V. Lepikhov sent in another revision of a patch to implement bulk COPY FROM into foreign tables.

Hou Zhijie sent in another revision of a patch to cache partition bound offsets adaptively.

Nathan Bossart sent in a patch to pre-allocate WAL segments.

Tomáš Vondra sent in two revisions of a patch to make it possible for logical decoding to replicate sequences.

Justin Pryzby sent in a patch to show "syncing data directories" in the ps display.

Anastasia Lubennikova sent in two more revisions of a patch to test the replay of regression tests.

Aleksander Alekseev and Michaël Paquier raded patches to fix dropped object handling in pg_event_trigger_ddl_commands.

Dilip Kumar and Amit Langote traded patches to fix decoding of speculative aborts.

Peter Eisentraut sent in a patch to generate node support functions automagically.

Kyotaro HORIGUCHI and Amit Kapila traded patches intended to fix a bug that manifested as a keepalive flood in logical replication.

Quan Zongliang sent in a patch to remove unused code from the KnownAssignedTransactionIdes submodule.

Emre Hasegeli sent in a patch to support bool in btree_gist.

Jacob Champion sent in a PoC patch to implement federated authn/z with OAUTHBEARER.

Jeff Davis sent in another revision of a patch to track last recovery LSN, time, and total count, which would make it possible to see unlogged table resets from other nodes.

Hou Zhijie sent in a patch to remove a now-unused function parameter in get_qual_from_partbound.

Hou Zhijie sent in two more revisions of a patch to make it possible for INSERT ... SELECT to execute in parallel.

Peter Geoghegan and Matthias van de Meent traded patches to fix a bug in GetOldestNonRemovableTransactionId. GetOldestNonRemovableTransactionId(rel) did not return values consistent with GlobalVisTestFor(rel). This is now updated, and some assertions are added to ensure this problem case does not return.

Peter Smith and Ajin Cherian traded patches to add support for prepared transactions to built-in logical replication, add a prepare API support for streaming transactions, and skip empty transactions for logical replication.

Ajin Cherian sent in three more revisions of a patch to add an option to set two-phase in CREATE_REPLICATION_SLOT command, and add support for two-phase decoding in pg_recvlogical.

Julien Rouhaud sent in two more revisions of a patch to add a hook for extensible parsing.

Kyotaro HORIGUCHI and Fabien COELHO traded patches to intended to fix a bug that manifested as pgbench logging 0-interval log entries.

Alexander Pyhalov sent in a patch to allow pushing CASE expressions to foreign servers.

Jeff Davis sent in two more revisions of a patch to implement ALTER TABLE ... SET ACCESS METHOD.

Nitin Jadhav sent in another revision of a patch to track startup progress.

Atsushi Torikoshi sent in another revision of a patch to add a function to log the complete query string and its plan for the query currently running on the backend with the specified process ID.

Thomas Mannhart sent in two more revisions of a patch to implement range merge joins.

Thomas Munro sent in a patch to adjust pg_regress output for new long test names.

Bharath Rupireddy sent in two more revisions of a patch to refactor parse_subscription_options, and remove similar ereport calls in parse_subscription_options.

Tomáš Vondra sent in two more revisions of a patch to create copy of a descriptor for batching, and initialize slots only once for batching.

Nathan Bossart sent in a patch to add a pg_ctl option for retrieving shmem size.

John Naylor sent in another revision of a patch to rewrite pg_utf8_verifystr for speed.

David Rowley sent in a patch to use "an SQL" in places where "a SQL" appeared.

Alexander Korotkov sent in four revisions of a patch to support UNNEST(multirange) and cast multirange as an array of ranges.

Hayato Kuroda sent in a patch to ignore failed threads in pgbench.

Amit Langote sent in another revision of a patch to implement multi-column list partitioning.

Jacob Champion sent in another revision of a patch to add an column projection list to the table AM API.

Bruce Momjian sent in three revisions of a patch to document the fact that some aggregate functions need to be re-created on upgrade.

Ranier Vilela sent in a patch to change some signed types to unsigned.

Pavel Stěhule sent in another revision of a patch to implement schema variables.

Thomas Munro sent in another revision of a patch to add WL_SOCKET_CLOSED for socket shutdown events, and use WL_SOCKET_CLOSED for client_connection_check_interval.

Fabien COELHO sent in another revision of a patch to add a SHOW_ALL_RESULTS to psql.

David Rowley sent in a patch to clean up some aggregate code in the executor.

David Rowley sent in a patch to improve various places that double the size of a buffer by replacing a tight loop doubling at each iteration with pg_nextpower2_32 or pg_nextpower2_64, as appropriate.

David Rowley sent in a patch to add proper planner support for ORDER BY / DISTINCT aggregates.

Noah Misch sent in a patch to remove pg_wait_for_backend_termination().

Yugo Nagata and Fabien COELHO traded patches to keep pgbench from getting stuck due to skipped transactions.

Ranier Vilela sent in another revision of a patch to reduce unmatched types in procarray.

Thomas Munro sent in three revisions of a patch to use tuple-level SIREAD locks for index-only scans, and skip SIREAD locks on btree pages when possible.

Alexander Korotkov sent in a patch to change the title of a new documentation chapter to Range/Multirange Functions and Operators, which is clearer.

Bharath Rupireddy sent in a patch to enhance the batch insert test case for postgres_fdw.

Tomáš Vondra sent in a patch to shorten the runtime of the postgres_fdw test.

Tomáš Vondra sent in a patch to handle Expr op Expr clauses in extended stats.