PostgreSQL Weekly News - May 16, 2021

Posted on 2021-05-17 by PWN

PostgreSQL Weekly News - May 16, 2021

Security releases 13.3, 12.7, 11.12, 10.17, and 9.6.22 are out. Please upgrade ASAP. The 9.6 series will stop getting fixes on November 11, 2021. Plan major version upgrades now.

Person of the week:

PostgreSQL Jobs for May

PostgreSQL in the News

Planet PostgreSQL:

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Tom Lane pushed:

  • Improve comments about USE_VALGRIND in pg_config_manual.h. These comments left the impression that USE_VALGRIND isn't really essential for valgrind testing. But that's wrong, as I learned the hard way today. Discussion:

  • Prevent integer overflows in array subscripting calculations. While we were (mostly) careful about ensuring that the dimensions of arrays aren't large enough to cause integer overflow, the lower bound values were generally not checked. This allows situations where lower_bound + dimension overflows an integer. It seems that that's harmless so far as array reading is concerned, except that array elements with subscripts notionally exceeding INT_MAX are inaccessible. However, it confuses various array-assignment logic, resulting in a potential for memory stomps. Fix by adding checks that array lower bounds aren't large enough to cause lower_bound + dimension to overflow. (Note: this results in disallowing cases where the last subscript position would be exactly INT_MAX. In principle we could probably allow that, but there's a lot of code that computes lower_bound + dimension and would need adjustment. It seems doubtful that it's worth the trouble/risk to allow it.) Somewhat independently of that, array_set_element() was careless about possible overflow when checking the subscript of a fixed-length array, creating a different route to memory stomps. Fix that too. Security: CVE-2021-32027

  • Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists. It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present. If it happens, the ON CONFLICT UPDATE code path would end up storing tuples that include the values of the extra resjunk columns. That's fairly harmless in the short run, but if new columns are added to the table then the values would become accessible, possibly leading to malfunctions if they don't match the datatypes of the new columns. This had escaped notice through a confluence of missing sanity checks, including: * There's no cross-check that a tuple presented to heap_insert or heap_update matches the table rowtype. While it's difficult to check that fully at reasonable cost, we can easily add assertions that there aren't too many columns. * The output-column-assignment cases in execExprInterp.c lacked any sanity checks on the output column numbers, which seems like an oversight considering there are plenty of assertion checks on input column numbers. Add assertions there too. * We failed to apply nodeModifyTable's ExecCheckPlanOutput() to the ON CONFLICT UPDATE tlist. That wouldn't have caught this specific error, since that function is chartered to ignore resjunk columns; but it sure seems like a bad omission now that we've seen this bug. In HEAD, the right way to fix this is to make the processing of ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists now do, that is don't add "SET x = x" entries, and use ExecBuildUpdateProjection to evaluate the tlist and combine it with old values of the not-set columns. This adds a little complication to ExecBuildUpdateProjection, but allows removal of a comparable amount of now-dead code from the planner. In the back branches, the most expedient solution seems to be to (a) use an output slot for the ON CONFLICT UPDATE projection that actually matches the target table, and then (b) invent a variant of ExecBuildProjectionInfo that can be told to not store values resulting from resjunk columns, so it doesn't try to store into nonexistent columns of the output slot. (We can't simply ignore the resjunk columns altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.) This works back to v10. In 9.6, projections work much differently and we can't cheaply give them such an option. The 9.6 version of this patch works by inserting a JunkFilter when it's necessary to get rid of resjunk columns. In addition, v11 and up have the reverse problem when trying to perform ON CONFLICT UPDATE on a partitioned table. Through a further oversight, adjust_partition_tlist() discarded resjunk columns when re-ordering the ON CONFLICT UPDATE tlist to match a partition. This accidentally prevented the storing-bogus-tuples problem, but at the cost that MULTIEXPR_SUBLINK cases didn't work, typically crashing if more than one row has to be updated. Fix by preserving resjunk columns in that routine. (I failed to resist the temptation to add more assertions there too, and to do some minor code beautification.) Per report from Andres Freund. Back-patch to all supported branches. Security: CVE-2021-32028

  • Replace opr_sanity test's binary_coercible() function with C code. opr_sanity's binary_coercible() function has always been meant to match the parser's notion of binary coercibility, but it also has always been a rather poor approximation of the parser's real rules (as embodied in IsBinaryCoercible()). That hasn't bit us so far, but it's predictable that it will eventually. It also now emerges that implementing this check in plpgsql performs absolutely horribly in clobber-cache-always testing. (Perhaps we could do something about that, but I suspect it just means that plpgsql is exploiting catalog caching to the hilt.) Hence, let's replace binary_coercible() with a C shim that directly invokes IsBinaryCoercible(), eliminating both the semantic hazard and the performance issue. Most of regress.c's C functions are declared in create_function_1, but we can't simply move that to before opr_sanity/type_sanity since those tests would complain about the resulting shell types. I chose to split it into create_function_0 and create_function_1. Since create_function_0 now runs as part of a parallel group while create_function_1 doesn't, reduce the latter to create just those functions that opr_sanity and type_sanity would whine about. To make room for create_function_0 in the second parallel group of tests, move tstypes to the third parallel group. In passing, clean up some ordering deviations between parallel_schedule and serial_schedule. Discussion:

  • Get rid of the separate serial_schedule list of tests. Having to maintain two lists of regression test scripts has proven annoyingly error-prone. We can achieve the effect of the serial_schedule by running the parallel_schedule with "--max_connections=1"; so do that and remove serial_schedule. This causes cosmetic differences in the progress output, but it doesn't seem worth restructuring pg_regress to avoid that. Discussion:

  • Fix's ancient misspelling of --max-connections. I copied the existing spelling of "--max_connections", but that's just wrong :-(. Evidently setting $ENV{MAX_CONNECTIONS} has never actually worked in this script. Given the lack of complaints, it's probably not worth back-patching a fix. Per buildfarm. Discussion:

  • Reduce runtime of privileges.sql test under CLOBBER_CACHE_ALWAYS. Several queries in the privileges regression test cause the planner to apply the plpgsql function "leak()" to every element of the histogram for atest12.b. Since commit 0c882e52a increased the size of that histogram to 10000 entries, the test invokes that function over 100000 times, which takes an absolutely unreasonable amount of time in clobber-cache-always mode. However, there's no real reason why that has to be a plpgsql function: for the purposes of this test, all that matters is that it not be marked leakproof. So we can replace the plpgsql implementation with a direct call of int4lt, which has the same behavior and is orders of magnitude faster. This is expected to cut several hours off the buildfarm cycle time for CCA animals. It has some positive impact in normal builds too, though that's probably lost in the noise. Back-patch to v13 where 0c882e52a came in. Discussion:

  • Initial pgindent and pgperltidy run for v14. Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.

  • Do pre-release housekeeping on catalog data. Run to move high-numbered OIDs down, as per pre-beta tasks specified by RELEASE_CHANGES. For reference, the command was ./ --first-mapped-oid 8000 --target-oid 6150

  • Doc: update bki.sgml's statements about OID ranges. Commit ab596105b neglected to make the docs match the code.

  • Double-space commands in system_constraints.sql/system_functions.sql. Previously, any error reported by the backend while reading system_constraints.sql would report the entire file, not just the particular command it was working on. (Ask me how I know.) Likewise, there were chunks of system_functions.sql that would be read as one command, which would be annoying if anything failed there. The issue for system_constraints.sql is an oversight in commit dfb75e478. I didn't try to trace down where the poor formatting in system_functions.sql started, but it's certainly contrary to the advice at the head of that file.

  • Refactor CHECK_FOR_INTERRUPTS() to add flexibility. Split up CHECK_FOR_INTERRUPTS() to provide an additional macro INTERRUPTS_PENDING_CONDITION(), which just tests whether an interrupt is pending without attempting to service it. This is useful in situations where the caller knows that interrupts are blocked, and would like to find out if it's worth the trouble to unblock them. Also add INTERRUPTS_CAN_BE_PROCESSED(), which indicates whether CHECK_FOR_INTERRUPTS() can be relied on to clear the pending interrupt. This commit doesn't actually add any uses of the new macros, but a follow-on bug fix will do so. Back-patch to all supported branches to provide infrastructure for that fix. Alvaro Herrera and Tom Lane Discussion:

  • Fix query-cancel handling in spgdoinsert(). Knowing that a buggy opclass could cause an infinite insertion loop, spgdoinsert() intended to allow its loop to be interrupted by query cancel. However, that never actually worked, because in iterations after the first, we'd be holding buffer lock(s) which would cause InterruptHoldoffCount to be positive, preventing servicing of the interrupt. To fix, check if an interrupt is pending, and if so fall out of the insertion loop and service the interrupt after we've released the buffers. If it was indeed a query cancel, that's the end of the matter. If it was a non-canceling interrupt reason, make use of the existing provision to retry the whole insertion. (This isn't as wasteful as it might seem, since any upper-level index tuples we already created should be usable in the next attempt.) While there's no known instance of such a bug in existing release branches, it still seems like a good idea to back-patch this to all supported branches, since the behavior is fairly nasty if a loop does happen --- not only is it uncancelable, but it will quickly consume memory to the point of an OOM failure. In any case, this code is certainly not working as intended. Per report from Dilip Kumar. Discussion:

  • Prevent infinite insertion loops in spgdoinsert(). Formerly we just relied on operator classes that assert longValuesOK to eventually shorten the leaf value enough to fit on an index page. That fails since the introduction of INCLUDE-column support (commit 09c1c6ab4), because the INCLUDE columns might alone take up more than a page, meaning no amount of leaf-datum compaction will get the job done. At least with spgtextproc.c, that leads to an infinite loop, since spgtextproc.c won't throw an error for not being able to shorten the leaf datum anymore. To fix without breaking cases that would otherwise work, add logic to spgdoinsert() to verify that the leaf tuple size is decreasing after each "choose" step. Some opclasses might not decrease the size on every single cycle, and in any case, alignment roundoff of the tuple size could obscure small gains. Therefore, allow up to 10 cycles without additional savings before throwing an error. (Perhaps this number will need adjustment, but it seems quite generous right now.) As long as we've developed this logic, let's back-patch it. The back branches don't have INCLUDE columns to worry about, but this seems like a good defense against possible bugs in operator classes. We already know that an infinite loop here is pretty unpleasant, so having a defense seems to outweigh the risk of breaking things. (Note that spgtextproc.c is actually the only known opclass with longValuesOK support, so that this is all moot for known non-core opclasses anyway.) Per report from Dilip Kumar. Discussion:

  • Be more careful about barriers when releasing BackgroundWorkerSlots. ForgetBackgroundWorker lacked any memory barrier at all, while BackgroundWorkerStateChange had one but unaccountably did additional manipulation of the slot after the barrier. AFAICS, the rule must be that the barrier is immediately before setting or clearing slot->in_use. It looks like back in 9.6 when ForgetBackgroundWorker was first written, there might have been some case for not needing a barrier there, but I'm not very convinced of that --- the fact that the load of bgw_notify_pid is in the caller doesn't seem to guarantee no memory ordering problem. So patch 9.6 too. It's likely that this doesn't fix any observable bug on Intel hardware, but machines with weaker memory ordering rules could have problems here. Discussion:

Michaël Paquier pushed:

Thomas Munro pushed:

Bruce Momjian pushed:

Peter Eisentraut pushed:

David Rowley pushed:

Andrew Dunstan pushed:

Fujii Masao pushed:

Etsuro Fujita pushed:

  • Fix EXPLAIN ANALYZE for async-capable nodes. EXPLAIN ANALYZE for an async-capable ForeignScan node associated with postgres_fdw is done just by using instrumentation for ExecProcNode() called from the node's callbacks, causing the following problems: 1) If the remote table to scan is empty, the node is incorrectly considered as "never executed" by the command even if the node is executed, as ExecProcNode() isn't called from the node's callbacks at all in that case. 2) The command fails to collect timings for things other than ExecProcNode() done in the node, such as creating a cursor for the node's remote query. To fix these problems, add instrumentation for async-capable nodes, and modify postgres_fdw accordingly. My oversight in commit 27e1f1456. While at it, update a comment for the AsyncRequest struct in execnodes.h and the documentation for the ForeignAsyncRequest API in fdwhandler.sgml to match the code in ExecAsyncAppendResponse() in nodeAppend.c, and fix typos in comments in nodeAppend.c. Per report from Andrey Lepikhov, though I didn't use his patch. Reviewed-by: Andrey Lepikhov Discussion:

  • Prevent asynchronous execution of direct foreign-table modifications. Commits 27e1f1456 and 86dc90056, which were independently discussed, cause a crash when executing an inherited foreign UPDATE/DELETE query with asynchronous execution enabled, where children of an Append node that is the direct/indirect child of the ModifyTable node are rewritten so as to modify foreign tables directly by postgresPlanDirectModify(); as in that case the direct modifications are executed asynchronously, which is not currently supported by asynchronous execution. Fix by disabling asynchronous execution of the direct modifications in that function. Author: Etsuro Fujita Reviewed-by: Amit Langote Discussion:

Álvaro Herrera pushed:

Amit Kapila pushed:

Alexander Korotkov pushed:

Peter Geoghegan pushed:

  • Fix autovacuum log output heap truncation issue. The percentage of blocks from the table value reported by autovacuum log output (following commit 5100010ee4d) should never exceed 100% because it describes the state of the table back when lazy_vacuum() was called. The value could nevertheless exceed 100% in the event of heap relation truncation. We failed to compensate for how truncation affects rel_pages. Fix the faulty accounting by using the original rel_pages value instead of the current/final rel_pages value. Reported-By: Andres Freund Discussion:

  • Harden nbtree deduplication posting split code. Add a defensive "can't happen" error to code that handles nbtree posting list splits (promote an existing assertion). This avoids a segfault in the event of an insertion of a newitem that is somehow identical to an existing non-pivot tuple in the index. An nbtree index should never have two index tuples with identical TIDs. This scenario is not particular unlikely in the event of any kind of corruption that leaves the index in an inconsistent state relative to the heap relation that is indexed. There are two known reports of preventable hard crashes. Doing nothing seems unacceptable given the general expectation that nbtree will cope reasonably well with corrupt data. Discussion: Backpatch: 13-, where nbtree deduplication was introduced.

Pending Patches

Amit Langote sent in another revision of a patch to allow batching of inserts during cross-partition updates.

Nitin Jadhav sent in a patch to match the file path with the actual file path in comments in src/backend/utils/activity/backend_status.c.

Bharath Rupireddy sent in a patch to make the PostgreSQL FDW cached connection functions tests meaningful.

Thomas Munro sent in a patch to clobber memory released by HASH_REMOVE, and avoid accessing local locks after freeing them.

Andres Freund sent in a patch to zero partial pages in walreceiver.

Peter Smith sent in a patch to ensure that GetSubscriptionRelations declares only 1 scan key.

Vigneshwaran C sent in three more revisions of a patch to include the actual datatype used in logical replication message descriptions.

Tang sent in three more revisions of a patch to make tab completion for DELETE FORM and INSERT INTO consistent in psql.

Vigneshwaran C sent in three more revisions of a patch to make error messages more informative as to the actual error.

Kyotaro HORIGUCHI sent in two more revisions of a patch to set expectedTLEs correctly based on recoveryTargetTLI.

Etsuro Fujita sent in another revision of a patch to fix EXPLAIN ANALYZE for async capable nodes.

Hou Zhijie sent in another revision of a patch to check UDF parallel safety in fmgr_info, fix UDF the parallel safety label in test case, check built-in function parallel safety in fmgr_info, and fix the builtin parallel safety label in test case.

Honza Horak sent in a patch to fix the subtransaction test for Python 3.10.

Masahiko Sawada sent in another revision of a patch to introduce a transaction manager for foreign transactions and use same to handle two-phase commits that involve them.

Masahiro Ikeda sent in two more revisions of a patch to improve the performance of reporting wal stats.

Hou Zhijie sent in two more revisions of a patch to make it possible for users to declare that a table allows parallel data modification, enable parallel select for insert, implement a utility function, pg_get_parallel_safety(regclass) that returns records of (objid, classid, parallel_safety) that represent the parallel safety of objects that determine the parallel safety of the specified table, and update the regression tests to include same.

Pavel Stěhule sent in two more revisions of a patch to implement schema variables.

Pavel Stěhule sent in another revision of a patch to make it possible for pg_dump to read table names from a file.

Pavel Stěhule sent in another revision of a patch to make it possible to specify a separate pager for \watch in psql.

Zeng Wenjing sent in another revision of a patch to implement global temporary tables.

Tom Lane sent in another revision of a patch to replace pg_depend PIN entries with a fixed range check.

Atsushi Torikoshi sent in a patch to add a pg_log_current_plan(pid) function which allows people to get the plan of a currently executing query.

Mark Dilger sent in another revision of a patch to delegate setting of GUC variables to some newly created ROLEs.

Michail Nikolaev sent in two more revisions of a patch to add full support for index LP_DEAD hint bits on standby.

Amul Sul sent in two more revisions of a patch to implement ALTER DATABASE READ {ONLY | WRITE}.

Vigneshwaran C and Bharath Rupireddy traded patches to clarify the messages and documentation for ALTER SUBSCRIPTION ... DROP PUBLICATION.

Ranier Vilela sent in a patch to fix an explicity NULL pointer dereference in src/backend/commands/tablecmds.c.

Michaël Paquier sent in a patch to add a vacuum_cleanup_index_scale_factor GUC.

Vigneshwaran C sent in two revisions of a patch intended to fix a bug that manifested as a message text failure by by checking the active column in pg_replication_slot instead of pg_stat_replication, the pg_replication_slot test being more reliable.

Bharath Rupireddy sent in a patch to reword error messages and docs for parallel VACUUM.

Nathan Bossart sent in a patch to allow specifying direct role membership in pg_hba.conf.

Peter Smith and Ajin Cherian traded patches to support logical decoding of two-phase transactions.

Peter Geoghegan sent in a patch to consider triggering failsafe during the first scan in vacuumlazy.c.

Amit Langote sent in a patch to fix a pgoutput tupeconv map leak.

Kyotaro HORIGUCHI sent in a patch to match errors about braces with their name.

Vigneshwaran C sent in three revisions of a patch to add tab completion for ALTER SUBSCRIPTION SET options streaming and binary to psql.

Ranier Vilela sent in two revisions of a patch to fix a possible memory corruption in src/timezone/zic.

Michaël Paquier sent in another revision of a patch to rewrite the test of pg_upgrade as a TAP test.

Nitin Jadhav and Justin Pryzby traded patches to remove extra memory allocations from create_list_bounds, allocate the PartitionListValue and create_hash_bounds PartitionHashBound as a single chunk, allocate datum arrays in bulk to avoid palloc overhead, and pfree() intermediate results in create_range_bounds.

Tom Lane sent in two revisions of a patch intended to fix a bug that manifested as prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619.

Tatsuo Ishii sent in two revisions of a patch to fix a typo in README.barrier.

Nitin Jadhav sent in a patch to support tzh_tzm patterns in to_char().