PostgreSQL Weekly News - November 8, 2020

Posted on 2020-11-09 by PWN

PostgreSQL Weekly News - November 8, 2020

Congratulations to new core team members Andres Freund and Jonathan Katz!

Person of the week:

PostgreSQL Product News

phpPgAdmin 7.13.1, a web-based administration tool for PostgreSQL, released.

Ajqvue Version 3.3, a java-based UI which supports PostgreSQL, released.

pg_statement_rollback, an extension that adds server side transaction with rollback at statement level, released.

PostgreSQL Jobs for November

PostgreSQL in the News

Planet PostgreSQL:

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Tom Lane pushed:

  • Fix two issues in TOAST decompression. pglz_maximum_compressed_size() potentially underestimated the amount of compressed data required to produce N bytes of decompressed data; this is a fault in commit 11a078cf8. Separately from that, pglz_decompress() failed to protect itself against corrupt compressed data, particularly off == 0 in a match tag. Commit c60e520f6 turned such a situation into an infinite loop, where before it'd just have resulted in garbage output. The combination of these two bugs seems like it may explain bug #16694 from Tom Vijlbrief, though it's impossible to be quite sure without direct inspection of the failing session. (One needs to assume that the pglz_maximum_compressed_size() bug caused us to fail to fetch the second byte of a match tag, and what happened to be there instead was a zero. The reported infinite loop is hard to explain without off == 0, though.) Aside from fixing the bugs, rewrite associated comments for more clarity. Back-patch to v13 where both these commits landed. Discussion:

  • Second thoughts on TOAST decompression. On detecting a corrupted match tag, pglz_decompress() should just summarily return -1. Breaking out of the loop, as I did in dfc797730, doesn't quite guarantee that will happen. Also, we can use unlikely() on that check, just in case it helps. Backpatch to v13, like the previous patch.

  • Rethink the generation rule for fmgroids.h macros. Traditionally, the names of fmgroids.h macros for pg_proc OIDs have been constructed from the prosrc field. But sometimes the same C function underlies multiple pg_proc entries, forcing us to make an arbitrary choice of which OID to reference; the other entries are then not namable via fmgroids.h. Moreover, we could not have macros at all for pg_proc entries that aren't for C-coded functions. Instead, use the proname field, and append the proargtypes field (replacing inter-argument spaces with underscores) if proname is not unique. Special-casing unique entries such as F_OIDEQ removes the need to change a lot of code. Indeed, I can only find two places in the tree that need to be adjusted; while this changes quite a few existing entries in fmgroids.h, few of them are referenced from C code. With this patch, all entries in pg_proc.dat have macros in fmgroids.h. Discussion:

  • Remove special checks for pg_rewrite.ev_qual and ev_action being NULL. make_ruledef() and make_viewdef() were coded to cope with possible null-ness of these columns, but they've been marked BKI_FORCE_NOT_NULL for some time. So there's not really any need to do more than what we do for the other columns of pg_rewrite, i.e. just Assert that we got non-null results. (There is a school of thought that says Asserts aren't the thing to do to check for corrupt data, but surely here is not the place to start if we want such a policy.) Also, remove long-dead-if-indeed-it-ever-wasn't-dead handling of an empty actions list in make_ruledef(). That's an error case and should be treated as such. (DO INSTEAD NOTHING is represented by a CMD_NOTHING Query, not an empty list; cf transformRuleStmt.) Kyotaro Horiguchi, some changes by me Discussion:

  • Fix unportable use of getnameinfo() in pg_hba_file_rules view. fill_hba_line() thought it could get away with passing sizeof(struct sockaddr_storage) rather than the actual addrlen previously returned by getaddrinfo(). While that appears to work on many platforms, it does not work on FreeBSD 11: you get back a failure, which leads to the view showing NULL for the address and netmask columns in all rows. The POSIX spec for getnameinfo() is pretty clearly on FreeBSD's side here: you should pass the actual address length. So it seems plausible that there are other platforms where this coding also fails, and we just hadn't noticed. Also, IMO the fact that getnameinfo() failure leads to a NULL output is pretty bogus in itself. Our pg_getnameinfo_all() wrapper is careful to emit "???" on failure, and we should use that in such cases. NULL should only be emitted in rows that don't have IP addresses. Per bug #16695 from Peter Vandivier. Back-patch to v10 where this code was added. Discussion:

  • Allow users with BYPASSRLS to alter their own passwords. The intention in commit 491c029db was to require superuserness to change the BYPASSRLS property, but the actual effect of the coding in AlterRole() was to require superuserness to change anything at all about a BYPASSRLS role. Other properties of a BYPASSRLS role should be changeable under the same rules as for a normal role, though. Fix that, and also take care of some documentation omissions related to BYPASSRLS and REPLICATION role properties. Tom Lane and Stephen Frost, per bug report from Wolfgang Walther. Back-patch to all supported branches. Discussion:

  • Improve error messages around REPLICATION and BYPASSRLS properties. Clarify wording as per suggestion from Wolfgang Walther. No back-patch; this doesn't seem worth thrashing translatable strings in the back branches. Tom Lane and Stephen Frost Discussion:

  • Guard against core dump from uninitialized subplan. If the planner erroneously puts a non-parallel-safe SubPlan into a parallelized portion of the query tree, nodeSubplan.c will fail in the worker processes because it finds a null in es_subplanstates, which it's unable to cope with. It seems worth a test-and-elog to make that an error case rather than a core dump case. This probably should have been included in commit 16ebab688, which was responsible for allowing nulls to appear in es_subplanstates to begin with. So, back-patch to v10 where that came in. Discussion:

  • Remove useless entries for aggregate functions from fmgrtab.c. treated aggregate functions the same as other built-in functions, which is wasteful because there is no real need to have entries for them in the fmgr_builtins[] table. Suppressing those entries saves about 3KB in the compiled table on my machine; which is not a lot but it's not nothing either, considering that that table is pretty "hot". The only outside code change needed is that ExecInitWindowAgg() can't be allowed to call fmgr_info_cxt() on a plain aggregate function. But that saves a few cycles anyway. Having done that, the aggregate_dummy() function is unreferenced and might as well be dropped. Using "aggregate_dummy" as the prosrc value for an aggregate is now just a documentation convention not something that matters. There was some discussion of using NULL instead to save a few bytes in pg_proc, but we'd have to remove prosrc's BKI_FORCE_NOT_NULL marking which doesn't seem a great idea. Anyway, it's possible there's client-side code that expects to see "aggregate_dummy" there, so I'm loath to change it without a strong reason. Discussion:

  • Improve our ability to regurgitate SQL-syntax function calls. The SQL spec calls out nonstandard syntax for certain function calls, for example substring() with numeric position info is supposed to be spelled "SUBSTRING(string FROM start FOR count)". We accept many of these things, but up to now would not print them in the same format, instead simplifying down to "substring"(string, start, count). That's long annoyed me because it creates an interoperability problem: we're gratuitously injecting Postgres-specific syntax into what might otherwise be a perfectly spec-compliant view definition. However, the real reason for addressing it right now is to support a planned change in the semantics of EXTRACT() a/k/a date_part(). When we switch that to returning numeric, we'll have the parser translate EXTRACT() to some new function name (might as well be "extract" if you ask me) and then teach ruleutils.c to reverse-list that per SQL spec. In this way existing calls to date_part() will continue to have the old semantics. To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX, and make the parser insert that rather than COERCE_EXPLICIT_CALL when the input has SQL-spec decoration. (But if the input has the form of a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even if it's calling one of these functions.) Then ruleutils.c recognizes COERCE_SQL_SYNTAX as a cue to emit SQL call syntax. It can know which decoration to emit using hard-wired knowledge about the functions that could be called this way. (While this solution isn't extensible without manual additions, neither is the grammar, so this doesn't seem unmaintainable.) Notice that this solution will reverse-list a function call with SQL decoration only if it was entered that way; so dump-and-reload will not by itself produce any changes in the appearance of views. This requires adding a CoercionForm field to struct FuncCall. (I couldn't resist the temptation to rearrange that struct's field order a tad while I was at it.) FuncCall doesn't appear in stored rules, so that change isn't a reason for a catversion bump, but I did one anyway because the new enum value for CoercionForm fields could confuse old backend code. Possible future work: * Perhaps CoercionForm should now be renamed to DisplayForm, or something like that, to reflect its more general meaning. This'd require touching a couple hundred places, so it's not clear it's worth the code churn. * The SQLValueFunction node type, which was invented partly for the same goal of improving SQL-compatibility of view output, could perhaps be replaced with regular function calls marked with COERCE_SQL_SYNTAX. It's unclear if this would be a net code savings, however. Discussion:

  • Declare lead() and lag() using anycompatible not anyelement. This allows use of a "default" expression that doesn't slavishly match the data column's type. Formerly you got something like "function lag(numeric, integer, integer) does not exist", which is not just unhelpful but actively misleading. The SQL spec suggests that the default should be coerced to the data column's type, but this implementation instead chooses the common supertype, which seems at least as reasonable. (Note: I took the opportunity to run "make reformat-dat-files" on pg_proc.dat, so this commit includes some cosmetic changes to recently-added entries that aren't related to lead/lag.) Vik Fearing Discussion:

  • Declare assorted array functions using anycompatible not anyelement. Convert array_append, array_prepend, array_cat, array_position, array_positions, array_remove, array_replace, and width_bucket to use anycompatiblearray. This is a simple extension of commit 5c292e6b9 to hit some other places where there's a pretty obvious gain in usability from doing so. Ideally we'd also modify other functions taking multiple old-style polymorphic arguments. But most of the remainder are tied into one or more operator classes, making any such change a much larger can of worms than I desire to open right now. Discussion:

  • Remove underflow error in float division with infinite divisor. float4_div and float8_div correctly produced zero for zero divided by infinity, but threw an underflow error for nonzero finite values divided by infinity. This seems wrong; at the very least it's inconsistent with the behavior recently implemented for numeric infinities. Remove the error and allow zero to be returned. This patch also removes a useless isinf() test from the overflow checks in these functions (non-Inf divided by Inf can't produce Inf). Extracted from a larger patch; this seems significant outside the context of geometric operators, so it deserves its own commit. Kyotaro Horiguchi Discussion:

  • Don't throw an error for LOCK TABLE on a self-referential view. LOCK TABLE has complained about "infinite recursion" when applied to a self-referential view, ever since we made it recurse into views in v11. However, that breaks pg_dump's new assumption that it's okay to lock every relation. There doesn't seem to be any good reason to throw an error: if we just abandon the recursion, we've still satisfied the requirement of locking every referenced relation. Per bug #16703 from Andrew Bille (via Alexander Lakhin). Discussion:

  • Revert "pg_dump: Lock all relations, not just plain tables". Revert 403a3d91c, as well as the followup fix 7f4235032, in all branches. We need to think a bit harder about what the behavior of LOCK TABLE on views should be, and there's no time for that before next week's releases. We'll take another crack at this later. Discussion:

  • Revert "Accept relations of any kind in LOCK TABLE". Revert 59ab4ac32, as well as the followup fix 33862cb9c, in all branches. We need to think a bit harder about what the behavior of LOCK TABLE on views should be, and there's no time for that before next week's releases. We'll take another crack at this later. Discussion:

  • Fix ecpg's mishandling of B'...' and X'...' literals. These were broken in multiple ways: * The xbstart and xhstart lexer actions neglected to set "state_before_str_start" before transitioning to the xb/xh states, thus possibly resulting in "internal error: unreachable state" later. * The test for valid string contents at the end of xb state was flat out wrong, as it accounted incorrectly for the "b" prefix that the xbstart action had injected. Meanwhile, the xh state had no such check at all. * The generated literal value failed to include any quote marks. * The grammar did the wrong thing anyway, typically ignoring the literal value and emitting something else, since BCONST and XCONST tokens were handled randomly differently from SCONST tokens. The first of these problems is evidently an oversight in commit 7f380c59f, but the others seem to be very ancient. The lack of complaints shows that ECPG users aren't using these syntaxes much (although I do vaguely remember one previous complaint). As written, this patch is dependent on 7f380c59f, so it can't go back further than v13. Given the shortage of complaints, I'm not excited about adapting the patch to prior branches. Report and patch by Shenhao Wang (test case adjusted by me) Discussion:

  • Avoid re-using output variables in new ecpg test case. The buildfarm thinks this leads to memory stomps, though annoyingly I can't duplicate that here. The existing code in strings.pgc is doing something that doesn't seem to be sanctioned at all really by the documentation, but I'm disinclined to try to make that nicer right now. Let's just declare some more output variables in hopes of working around it.

David Rowley pushed:

  • Allow run-time pruning on nested Append/MergeAppend nodes. Previously we only tagged on the required information to allow the executor to perform run-time partition pruning for Append/MergeAppend nodes belonging to base relations. It was thought that nested Append/MergeAppend nodes were just about always pulled up into the top-level Append/MergeAppend and that making the run-time pruning info for any sub Append/MergeAppend nodes was a waste of time. However, that was likely badly thought through. Some examples of cases we're unable to pullup nested Append/MergeAppends are: 1) Parallel Append nodes with a mix of parallel and non-parallel paths into a Parallel Append. 2) When planning an ordered Append scan a sub-partition which is unordered may require a nested MergeAppend path to ensure sub-partitions don't mix up the order of tuples being fed into the top-level Append. Unfortunately, it was not just as simple as removing the lines in createplan.c which were purposefully not building the run-time pruning info for anything but RELOPT_BASEREL relations. The code in add_paths_to_append_rel() was far too sloppy about which partitioned_rels it included for the Append/MergeAppend paths. The original code there would always assume accumulate_append_subpath() would pull each sub-Append and sub-MergeAppend path into the top-level path. While it does not appear that there were any actual bugs caused by having the additional partitioned table RT indexes recorded, what it did mean is that later in planning, when we built the run-time pruning info that we wasted effort and built PartitionedRelPruneInfos for partitioned tables that we had no subpaths for the executor to run-time prune. Here we tighten that up so that partitioned_rels only ever contains the RT index for partitioned tables which actually have subpaths in the given Append/MergeAppend. We can now Assert that every PartitionedRelPruneInfo has a non-empty present_parts. That should allow us to catch any weird corner cases that have been missed. In passing, it seems there is no longer a good reason to have the AppendPath and MergeAppendPath's partitioned_rel fields a List of IntList. We can simply have a List of Relids instead. This is more compact in memory and faster to add new members to. We still know which is the root level partition as these always have a lower relid than their children. Previously this field was used for more things, but run-time partition pruning now remains the only user of it and it has no need for a List of IntLists. Here we also get rid of the RelOptInfo partitioned_child_rels field. This is what was previously used to (sometimes incorrectly) set the Append/MergeAppend path's partitioned_rels field. That was the only usage of that field, so we can happily just remove it. I also couldn't resist changing some nearby code to make use of the newly added for_each_from macro so we can skip the first element in the list without checking if the current item was the first one on each iteration. A bug report from Andreas Kretschmer prompted all this work, however, after some consideration, I'm not personally classing this as a bug fix. So no backpatch. In Andreas' test case, it just wasn't that clear that there was a nested Append since the top-level Append just had a single sub-path which was pulled up a level, per 8edd0e794. Author: David Rowley Reviewed-by: Amit Langote Discussion:

  • Fix unstable partition_prune regression tests. This was broken recently by a929e17e5. I'd failed to remember that parallel tests should have their EXPLAIN output run through the explain_parallel_append function so that the output is stable when parallel workers fail to start. fairywren was first to notice. Reported-by: Michael Paquier Discussion:

Amit Kapila pushed:

Michaël Paquier pushed:

Heikki Linnakangas pushed:

Thomas Munro pushed:

Magnus Hagander pushed:

Peter Eisentraut pushed:

Tomáš Vondra pushed:

Fujii Masao pushed:

Peter Geoghegan pushed:

  • Fix nbtree cleanup-only VACUUM stats inaccuracies. Logic for counting heap TIDs from posting list tuples (added by commit 0d861bbb) was faulty. It didn't count any TIDs/index tuples in the event of no callback being set. This meant that we incorrectly counted no index tuples in clean-up only VACUUMs, which could lead to pg_class.reltuples being spuriously set to 0 in affected indexes. To fix, go back to counting items from the page in cases where there is no callback. This approach isn't very accurate, but it works well enough in practice while avoiding the expense of accessing every index tuple during cleanup-only VACUUMs. Author: Peter Geoghegan pg [AT] Reported-By: Jehan-Guillaume de Rorthais jgdr [AT] Backpatch: 13-, where nbtree deduplication was introduced

  • Fix wal_consistency_checking nbtree bug. wal_consistency_checking indicated an inconsistency in certain cases involving nbtree page deletion. The underlying issue is that there was a minor difference between the page image produced after a REDO routine ran and the corresponding page image following original execution. This harmless inconsistency has been around forever. We more or less expect total consistency among even deleted nbtree pages these days, though, so this won't do anymore. To fix, tweak the REDO routine to match original execution. Oversight in commit f47b5e13.

  • Improve nbtree README's LP_DEAD section. The description of how LP_DEAD bit setting by index scans works following commit 2ed5b87f was rather unclear. Clean that up a bit. Also refer to LP_DEAD bit setting within _bt_check_unique() at the start of the same section. This mechanism may actually be more important than the generic kill_prior_tuple mechanism that the section focuses on, so it at least deserves to be mentioned in passing.

Álvaro Herrera pushed:

Pending Patches

Nikhil Benesch sent in a patch to support negative indexes in the split_part() function, those counting from the end of the array instead of the start.

Justin Pryzby sent in another revision of a patch to refactor CIC to rely on REINDEX CONCURRENTLY, refactor same to allow reindexing all index partitions at once, and make ReindexPartitions() set indisvalid so things that come by later can see they're ready to go.

Magnus Hagander sent in another revision of a patch to remove the obsolete script and things that know about it from pg_upgrade.

Anastasia Lubennikova sent in another revision of a patch to teach COPY FREEZE to set PD_ALL_VISIBLE and visibility map bits.

David G. Johnston sent in another revision of a patch to clarify the fact that signal functions have no feedback.

Heikki Linnakangas sent in four revisions of a patch to split copy.c into copyto.c and copyfrom.c, and further split copyfrom.c into copyfrom.c and copyfromparse.c. This will make working on the usually independent functionalities of the split files more convenient and easier to read.

Álvaro Herrera sent in another revision of a patch to add batch/pipelining support to libpq.

Pavel Stěhule sent in a patch to reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL.

Magnus Hagander sent in two revisions of a patch to remove the obsolete -O switch for postgres, which once allowed passing options to each server process.

Kyotaro HORIGUCHI sent in two more revisions of a patch to fix a dereference before NULL check in src/backend/storage/ipc/latch.c.

David Rowley sent in two revisions of a patch to reduce the number of special cases to build contrib modules on windows.

Konstantin Knizhnik sent in three more revisions of a patch to add custom compression to libpq.

Fabien COELHO sent in another revision of a patch to pgbench to add an option which delays queries until connections are established.

Thomas Munro and David Rowley traded patches to implement collation versioning.

Jinbao Chen sent in a patch to add a new table am 'tid_visible' to test the visibility of a tid.

Peter Geoghegan sent in another revision of a patch to add delete deduplication to nbtree.

Stephen Frost sent in two more revisions of a patch to use pre-fetching for ANALYZE.

Tomáš Vondra sent in another revision of a patch to use INT64_FORMAT to print int64 variables in sort debug.

Bharath Rupireddy sent in another revision of a patch to use multi Inserts in Create Table As.

Amit Langote sent in another revision of a patch to call BeginDirectModify from ExecInitModifyTable, and initialize result relation information lazily. This work builds infrastructure that will later be used to make writes on foreign tables more efficient.

Vigneshwaran C sent in two more revisions of a patch to improve the connection authorization message for GSS authenticated/encrypted connections by adding a log message to include GSS authentication, encryption, and principal information. This message will help the user to know whether GSS authentication or encryption was used and which GSS principal was used.

Tomáš Vondra sent in three more revisions of a patch to implement BRIN multi-range indexes.

Álvaro Herrera sent in another revision of a patch to implement ALTER TABLE ... DETACH CONCURRENTLY.

Tsutomu Yamada sent in another revision of a patch to add \dX, which lists extended statistics, to psql.

Pavel Borisov sent in two more revisions of a patch to deprecate and replace <^ and >^ operators for points.

Melanie Plageman sent in another revision of a patch to support parallel FULL JOIN and RIGHT JOIN.

Kyotaro HORIGUCHI sent in two more revisions of a patch to use shared memory instead of files for storage in the stats collector.

Ajin Cherian and Peter Smith traded patches to add logical decoding of two-phase transactions.

Kirk Jamison sent in another revision of a patch to make DropRelFileNodeBuffers() more efficient during recovery by avoiding scanning the whole buffer pool when the relation is small enough, or the the total number of blocks to be invalidated is below the threshold of full scanning.

Daniel Gustafsson sent in two more revisions of a patch to support NSS as a TLS backend for libpq.

Takamichi Osumi sent in three more revisions of a patch to implement CREATE OR REPLACE TRIGGER.

Fujii Masao sent in another revision of a patch to use standard SIGHUP and SIGTERM handlers in the autoprewarm process.

Justin Pryzby sent in another revision of a patch to Implement CLUSTER of partitioned table. This requires either specification of a partitioned index on which to cluster, or that an partitioned index was previously set clustered.

Kyotaro HORIGUCHI sent in another revision of a patch to fix NaN handling in some geometric operators and functions.

Michaël Paquier sent in another revision of a patch to rework the SHA2 APIs, switch sha2_openssl.c to use EVP, and make pgcrypto use the in-core resowner facility for EVP.

Yuzuko Hosoya sent in another revision of a patch to fix some infelicities between autovacuum and partitioned tables.

Justin Pryzby sent in another revision of a patch to make pg_ls_* show directories and shared filesets.

Seino Yuki sent in another revision of a patch to enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW.

Kyotaro HORIGUCHI sent in two more revisions of a patch to implement CatCache expiration.

David Pirotte sent in another revision of a patch to add logical decoding messages to pgoutput.

Masahiko Sawada sent in two more revisions of a patch to implement a transaction manager for foreign transactions.

Masahiro Ikeda sent in another revision of a patch to add statistics to the pg_stat_wal view.

Justin Pryzby sent in a patch atop the one for incremental view maintenance patch which fixes some documentation.

Michaël Paquier sent in a patch to refactor the MD5 implementations to be just one, and switch to EVP for OpenSSL.

Justin Pryzby sent in another revision of a patch to clarify the computation of min/max IO and specifically the double use and effect of correlation, avoid re-using the "pages_fetched" variable, and use the correlation statistic in costing bitmap scans as for an index scan.

Sergei Kornilov sent in another revision of a patch to allow some recovery parameters to be changed with reload.

Marina Polyakova sent in two revisions of a patch to fix a bug that manifested as pgbench no longer supporting a large number of client connections on Windows.

Andrey Borodin sent in another revision of a patch to add Sortsupport for sorting GiST build for gist_btree types.

Jürgen Purtz and Erik Rijkers traded patches to add an architecture chapter to the tutorial.

Dilip Kumar sent in another revision of a patch to implement custom table compression methods.

Tomáš Vondra sent in a patch to remove some duplicate code from brin_memtuple_initialize.