PostgreSQL Weekly News - June 21, 2021

Posted on 2021-06-21 by PWN

PostgreSQL Weekly News - June 21, 2021

pgMustard 4, a user interface for 'explain analyze' which provides performance tips, released.

psycopg2 2.9.0, a Python connector for PostgreSQL, released.

pgAdmin4 5.4, a web- and native GUI control center for PostgreSQL, released.

Person of the week

PostgreSQL Product News

PostgreSQL Jobs for June


PostgreSQL in the News

Planet PostgreSQL: Planet

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Tom Lane pushed:

  • Work around portability issue with newer versions of mktime(). Recent glibc versions have made mktime() fail if tm_isdst is inconsistent with the prevailing timezone; in particular it fails for tm_isdst = 1 when the zone is UTC. (This seems wildly inconsistent with the POSIX-mandated treatment of "incorrect" values for the other fields of struct tm, so if you ask me it's a bug, but I bet they'll say it's intentional.) This has been observed to cause cosmetic problems when pg_restore'ing an archive created in a different timezone. To fix, do mktime() using the field values from the archive, and if that fails try again with tm_isdst = -1. This will give a result that's off by the UTC-offset difference from the original zone, but that was true before, too. It's not terribly critical since we don't do anything with the result except possibly print it. (Someday we should flush this entire bit of logic and record a standard-format timestamp in the archive instead. That's not okay for a back-patched bug fix, though.) Also, guard our only other use of mktime() by having initdb's build_time_t() set tm_isdst = -1 not 0. This case could only have an issue in zones that are DST year-round; but I think some do exist, or could in future. Per report from Wells Oliver. Back-patch to all supported versions, since any of them might need to run with a newer glibc. Discussion:

  • Remove orphaned expected-result file. This should have been removed in 43e084197, which removed the corresponding spec file. Noted while fooling about with the isolationtester.

  • Update variant expected-result file. This should have been updated in d2d8a229b, but it was overlooked. According to 31a877f18 which added it, this file is meant to show the results you get under default_transaction_isolation = serializable. We've largely lost track of that goal in other isolation tests, but as long as we've got this one, it should be right. Noted while fooling about with the isolationtester.

  • Remove another orphan expected-result file. aborted-keyrevoke_2.out was apparently needed when it was added (in commit 0ac5ad513) to handle the case of serializable transaction mode. However, the output in serializable mode actually matches the regular aborted-keyrevoke.out file, and AFAICT has done so for a long time. There's no need to keep dragging this variant along.

  • Update another variant expected-result file. This should have been updated in 533e9c6b0, but it was overlooked. Given the lack of complaints, I won't bother back-patching.

  • Improve SQLSTATE reporting in some replication-related code. I started out with the goal of reporting ERRCODE_CONNECTION_FAILURE when walrcv_connect() fails, but as I looked around I realized that whoever wrote this code was of the opinion that errcodes are purely optional. That's not my understanding of our project policy. Hence, make sure that an errcode is provided in each ereport that (a) is ERROR or higher level and (b) isn't arguably an internal logic error. Also fix some very dubious existing errcode assignments. While this is not per policy, it's also largely cosmetic, since few of these cases could get reported to applications. So I don't feel a need to back-patch. Discussion:

  • Fix plancache refcount leak after error in ExecuteQuery. When stuffing a plan from the plancache into a Portal, one is not supposed to risk throwing an error between GetCachedPlan and PortalDefineQuery; if that happens, the plan refcount incremented by GetCachedPlan will be leaked. I managed to break this rule while refactoring code in 9dbf2b7d7. There is no visible consequence other than some memory leakage, and since nobody is very likely to trigger the relevant error conditions many times in a row, it's not surprising we haven't noticed. Nonetheless, it's a bug, so rearrange the order of operations to remove the hazard. Noted on the way to looking for a better fix for bug #17053. This mistake is pretty old, so back-patch to all supported branches.

  • Centralize the logic for protective copying of utility statements. In the "simple Query" code path, it's fine for parse analysis or execution of a utility statement to scribble on the statement's node tree, since that'll just be thrown away afterwards. However it's not fine if the node tree is in the plan cache, as then it'd be corrupted for subsequent executions. Up to now we've dealt with that by having individual utility-statement functions apply copyObject() if they were going to modify the tree. But that's prone to errors of omission. Bug #17053 from Charles Samborski shows that CREATE/ALTER DOMAIN didn't get this memo, and can crash if executed repeatedly from plan cache. In the back branches, we'll just apply a narrow band-aid for that, but in HEAD it seems prudent to have a more principled fix that will close off the possibility of other similar bugs in future. Hence, let's hoist the responsibility for doing copyObject up into ProcessUtility from its children, thus ensuring that it happens for all utility statement types. Also, modify ProcessUtility's API so that its callers can tell it whether a copy step is necessary. It turns out that in all cases, the immediate caller knows whether the node tree is transient, so this doesn't involve a huge amount of code thrashing. In this way, while we lose a little bit in the execute-from-cache code path due to sometimes copying node trees that wouldn't be mutated anyway, we gain something in the simple-Query code path by not copying throwaway node trees. Statements that are complex enough to be expensive to copy are almost certainly ones that would have to be copied anyway, so the loss in the cache code path shouldn't be much. (Note that this whole problem applies only to utility statements. Optimizable statements don't have the issue because we long ago made the executor treat Plan trees as read-only. Perhaps someday we will make utility statement execution act likewise, but I'm not holding my breath.) Discussion: Discussion:

  • Improve version reporting in pgbench. Commit 547f04e73 caused pgbench to start printing its version number, which seems like a fine idea, but it needs a bit more work: * Print the server version number too, when different. * Print the PG_VERSION string, not some reconstructed approximation. This patch copies psql's well-tested code for the same purpose. Discussion:

  • Fix misbehavior of DROP OWNED BY with duplicate polroles entries. Ordinarily, a pg_policy.polroles array wouldn't list the same role more than once; but CREATE POLICY does not prevent that. If we perform DROP OWNED BY on a role that is listed more than once, RemoveRoleFromObjectPolicy either suffered an assertion failure or encountered a tuple-updated-by-self error. Rewrite it to cope correctly with duplicate entries, and add a CommandCounterIncrement call to prevent the other problem. Per discussion, there's other cleanup that ought to happen here, but this seems like the minimum essential fix. Per bug #17062 from Alexander Lakhin. It's been broken all along, so back-patch to all supported branches. Discussion:

  • Provide feature-test macros for libpq features added in v14. We had a request to provide a way to test at compile time for the availability of the new pipeline features. More generally, it seems like a good idea to provide a way to test via #ifdef for all new libpq API features. People have been using the version from pg_config.h for that; but that's more likely to represent the server version than the libpq version, in the increasingly-common scenario where they're different. It's safer if libpq-fe.h itself is the source of truth about what features it offers. Hence, establish a policy that starting in v14 we'll add a suitable feature-is-present macro to libpq-fe.h when we add new API there. (There doesn't seem to be much point in applying this policy retroactively, but it's not too late for v14.) Tom Lane and Alvaro Herrera, per suggestion from Boris Kolpackov. Discussion:

  • Stabilize test case added by commit f61db909d. Buildfarm members ayu and tern have sometimes shown a different plan than expected for this query. I'd been unable to reproduce that before today, but I finally realized what is happening. If there is a concurrent open transaction (probably an autovacuum run in the buildfarm, but this can also be arranged manually), then the index entries for the rows removed by the DELETE a few lines up are not killed promptly, causing a change in the planner's estimate of the extremal value of ft2.c1, which moves the rowcount estimate for "c1 > 1100" by enough to change the join plan from nestloop to hash. To fix, change the query condition to "c1 > 1000", causing the hash plan to be preferred whether or not a concurrent open transaction exists. Since this UPDATE is tailored to be a no-op, nothing else changes. Report: Report: Report:

Michaël Paquier pushed:

Bruce Momjian pushed:

Álvaro Herrera pushed:

Noah Misch pushed:

Amit Kapila pushed:

Alexander Korotkov pushed:

Peter Geoghegan pushed:

  • Remove unneeded field from VACUUM state. Bugfix commit 5fc89376 effectively made the lock_waiter_detected field from vacuumlazy.c's global state struct into private state owned by lazy_truncate_heap(). Finish this off by replacing the struct field with a local variable.

  • Support disabling index bypassing by VACUUM. Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding reloption): make it into a ternary style boolean parameter. It now exposes a third option, "auto". The "auto" option (which is now the default) enables the "bypass index vacuuming" optimization added by commit 1e55e7d1. "VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM simply do any required index vacuuming, regardless of how few dead tuples are encountered during the first scan of the target heap relation (unless there are exactly zero). This gives users a way of opting out of the "bypass index vacuuming" optimization, if for whatever reason that proves necessary. It is also expected to be used by PostgreSQL developers as a testing option from time to time. "VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it forcibly disables both index vacuuming and index cleanup. It's not expected to be used much in PostgreSQL

  • The failsafe mechanism added by commit 1e55e7d1 addresses the same problem in a simpler way. INDEX_CLEANUP can now be thought of as a testing and compatibility option. Author: Peter Geoghegan Reviewed-By: Masahiko Sawada Reviewed-By: Justin Pryzby Discussion:

Andrew Dunstan pushed:

Heikki Linnakangas pushed:

  • Fix outdated comment that talked about seek position of WAL file. Since commit c24dcd0cfd, we have been using pg_pread() to read the WAL file, which doesn't change the seek position (unless we fall back to the implementation in src/port/pread.c). Update comment accordingly. Backpatch-through: 12, where we started to use pg_pread()

  • Tidy up GetMultiXactIdMembers()'s behavior on error. One of the error paths left *members uninitialized. That's not a live bug, because most callers don't look at *members when the function returns -1, but let's be tidy. One caller, in heap_lock_tuple(), does "if (members != NULL) pfree(members)", but AFAICS it never passes an invalid 'multi' value so it should not reach that error case. The callers are also a bit inconsistent in their expectations. heap_lock_tuple() pfrees the 'members' array if it's not-NULL, others pfree() it if "nmembers >= 0", and others if "nmembers > 0". That's not a live bug either, because the function should never return 0, but add an Assert for that to make it more clear. I left the callers alone for now. I also moved the line where we set *nmembers. It wasn't wrong before, but I like to do that right next to the 'return' statement, to make it clear that it's always set on return. Also remove one unreachable return statement after ereport(ERROR), for brevity and for consistency with the similar if-block right after it. Author: Greg Nancarrow with the additional changes by me Backpatch-through: 9.6, all supported versions

Tomáš Vondra pushed:

  • Fix copying data into slots with FDW batching. Commit b676ac443b optimized handling of tuple slots with bulk inserts into foreign tables, so that the slots are initialized only once and reused for all batches. The data was however copied into the slots only after the initialization, inserting duplicate values when the slot gets reused. Fixed by moving the ExecCopySlot outside the init branch. The existing postgres_fdw tests failed to catch this due to inserting data into foreign tables without unique indexes, and then checking only the number of inserted rows. This adds a new test with both a unique index and a check of inserted values. Reported-by: Alexander Pyhalov Discussion:

Fujii Masao pushed:

Pending Patches

Amit Khandekar sent in a WIP patch to allow subtransactions in worker processes.

Bertrand Drouvot sent in another revision of a patch to allow logical decoding on standbys.

Thomas Munro sent in another revision of a patch to use tuple-level SIREAD locks for index-only scans, and skip SIREAD locks on btree pages when possible.

Andrew Dunstan sent in another revision of a patch to intended to fix a bug that manifested as Segmentation fault on altering the type of the foreign table column with a default.

Tomáš Vondra sent in another revision of a patch to use extended statistics to improve join estimates.

Bharath Rupireddy sent in another revision of a patch to remove pg_wait_for_backend_termination().

Fabien COELHO sent in another revision of a patch to consolidate the echo system in psql to its own function.

Yugo Nagata and Fabien COELHO traded patches for pgbench which ensure that it computes and stores conn_duration only when requested.

Jehan-Guillaume de Rorthais sent in a PoC patch to Trace wait events to logfile when log_statement_stats=on.

Yugo Nagata and Fabien COELHO traded patches to avoid a stuck condition in pbgench due to skipped transactions.

Jacob Champion and Daniel Gustafsson traded patches to support NSS as a libpq TLS backend.

Dilip Kumar and Amit Langote traded patches to intended to fix a bug that manifested as decoding speculative insert with toast leaks memory.

Yugo Nagata and Fabien COELHO traded patches to intended to fix a bug in pgbench that manifested as negative "initial connection time".

Takamichi Osumi and Amit Kapila traded patches to fix an infelicity among locking [user] catalog tables, 2PC, and logical replication.

Justin Pryzby and Michaël Paquier traded patches to add more options for WAL compression.

Ranier Vilela sent in another revision of a patch to make accesses to ProcArray more efficient.

David Fetter sent in a patch to use the singular number where appropriate in the regression tests.

Alexander Pyhalov sent in another revision of a patch to implement case expression pushdown.

Ranier Vilela sent in a patch to fix a buffer in ecpg which is not null terminated.

Fabien COELHO and Yugo Nagata traded patches to intended to fix a bug that manifested as errors in pgbench logging.

Dmitry Dolgov sent in two more revisions of a patch to prevent jumbling of every element in ArrayExpr.

Dilip Kumar sent in a patch to make CREATE DATABASE fully WAL logged so issuing that command no longer forces a checkpoint.

Ajin Cherian sent in another revision of a patch to add an option to set two-phase in CREATE_REPLICATION_SLOT command, and support two-phase decoding in pg_recvlogical.

Peter Smith and Ajin Cherian traded patches to support logical decoding of two-phase transactions.

Daniel Gustafsson sent in a patch to use the more correct and current TLS/SSL instead of SSL in documentation.

Thomas Munro sent in another revision of a patch to track relation sizes in shared memory, provide a lock-free fast path for smgrnblocks(), and update fifo to lru to sweep a valid cache.

Kyotaro HORIGUCHI sent in another revision of a patch to show more details in errors in pg_waldump.

Tom Lane sent in a patch to allow regular identifiers in isolationtester scripts.

Matthias van de Meent sent in another revision of a patch to fix a bug in GetOldestNonRemovableTransactionId, which did not return values consistent with GlobalVisTestFor(rel). Additionally, lazy_scan_prune had an incorrect assumption that GlobalVis*Rels would never have an OldestXmin < vacrel->OldestXmin, which is incorrect. This assumption is now fixed, and the changes have been documented.

Heikki Linnakangas sent in a patch to split xlog.c, which was getting unwieldy.

Thomas Munro sent in another revision of a patch to make and use a qsort template.

Amit Langote sent in three more revisions of a patch to skip partition tuple routing with a constant partition key.

Jeff Davis sent in two revisions of a patch to clarify the documentation for the replication protocol.

Tomáš Vondra sent in a PoC patch to use a count-min sketch for join cardinality estimation.

Jeff Davis sent in another revision of a patch to document what happens in START REPLICATION.

Alexander Korotkov sent in three more revisions of a patch to implement UNNEST for multiranges.

Vigneshwaran C sent in another revision of a patch to add schema-level support for PUBLICATIONs.

Amul Sul sent in another revision of a patch to implement ALTER SYSTEM SET READ {WRITE | ONLY}.

Nitin Jadhav sent in another revision of a patch to implement a startup process progress indicator.

Matthias van de Meent sent in another revision of a patch to implement and use index tuple attribute iteration, and implement page-level dynamic prefix truncation for _bt_binsrch*.

Mark Dilger sent in two revisions of a patch to optionally disable subscriptions on error.

Jeff Davis sent in a patch to fix start replication identify system.

Thomas Munro sent in another revision of a patch to implement snapshot_too_old using a timer.

Takashi Menjo sent in another revision of a patch to map WAL segment files on PMEM as WAL buffers.

Amit Kapila sent in another revision of a patch to implement row filtering for logical replication.

Egor Rogov sent in a patch to include statistics for range types in the pg_stats view.

Álvaro Herrera sent in two more revisions of a patch to add a test case for obsoleting slots with active walsenders.

Greg Sabino Mullane sent in another revision of a patch to avoid computing page checksums unnecessarily.

Noah Misch sent in a patch to remove XLogFileInit()'s ability to skip ControlFileLock.

David Rowley sent in another revision of a patch to add a new hash table type which has stable pointers.

Thomas Munro sent in a patch to adjust for the LLVM 13 API change.