PostgreSQL Weekly News - November 29, 2020

Posted on 2020-11-30 by PWN

PostgreSQL Weekly News - November 29, 2020

PostgreSQL Product News

Pgpool-II 4.2.0, a connection pooler and statement replication system for PostgreSQL, released.

pgBadger v11.4, a PostgreSQL log analyzer and graph tool written in Perl, released.

Database Lab 2.0, a tool for fast cloning of large PostgreSQL databases to build non-production environments, released:

pgagroal 1.0.0, a high-performance protocol-native connection pool for PostgreSQL, released.

PostgreSQL Jobs for November

PostgreSQL in the News

Planet PostgreSQL:

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Tom Lane pushed:

  • Allow a multi-row INSERT to specify DEFAULTs for a generated column. One can say "INSERT INTO tab(generated_col) VALUES (DEFAULT)" and not draw an error. But the equivalent case for a multi-row VALUES list always threw an error, even if one properly said DEFAULT in each row. Fix that. While here, improve the test cases for nearby logic about OVERRIDING SYSTEM/USER values. Dean Rasheed Discussion:

  • Improve wording of two error messages related to generated columns. Clarify that you can "insert" into a generated column as long as what you're inserting is a DEFAULT placeholder. Also, use ERRCODE_GENERATED_ALWAYS in place of ERRCODE_SYNTAX_ERROR; there doesn't seem to be any reason to use the less specific errcode. Discussion:

  • Rename the "point is strictly above/below point" comparison operators. Historically these were called >^ and <^, but that is inconsistent with the similar box, polygon, and circle operators, which are named |>> and <<| respectively. Worse, the >^ and <^ names are used for not strict above/below tests for the box type. Hence, invent new operators following the more common naming. The old operators remain available for now, and are still accepted by the relevant index opclasses too. But there's a deprecation notice, so maybe we can get rid of them someday. Emre Hasegeli, reviewed by Pavel Borisov Discussion:

  • Remove unnecessary #include. Justin Pryzby Discussion:

  • Centralize logic for skipping useless ereport/elog calls. While ereport() and elog() themselves are quite cheap when the error message level is too low to be printed, some places need to do substantial work before they can call those macros at all. To allow optimizing away such setup work when nothing is to be printed, make elog.c export a new function message_level_is_interesting(elevel) that reports whether ereport/elog will do anything. Make use of that in various places that had ad-hoc direct tests of log_min_messages etc. Also teach ProcSleep to use it to avoid some work. (There may well be other places that could usefully use this; I didn't search hard.) Within elog.c, refactor a little bit to avoid having duplicate copies of the policy-setting logic. When that code was written, we weren't relying on the availability of inline functions; so it had some duplications in the name of efficiency, which I got rid of. Alvaro Herrera and Tom Lane Discussion:

  • Put "inline" marker on declarations of inline functions. I'm having a hard time telling whether the letter of the C standard requires this, but we do have a couple of buildfarm members that throw warnings when this is not done. Oversight in c532d15dd.

  • Avoid spamming the client with multiple ParameterStatus messages. Up to now, we sent a ParameterStatus message to the client immediately upon any change in the active value of any GUC_REPORT variable. This was only barely okay when the feature was designed; now that we have things like function SET clauses, there are very plausible use-cases where a GUC_REPORT variable might change many times within a query --- and even end up back at its original value, perhaps. Fortunately most of our GUC_REPORT variables are unlikely to be changed often; but there are proposals in play to enlarge that set, or even make it user-configurable. Hence, let's fix things to not generate more than one ParameterStatus message per variable per query, and to not send any message at all unless the end-of-query value is different from what we last reported. Discussion:

  • Doc: minor improvements for section 11.2 "Index Types". Break the per-index-type discussions into <sect2>'s so as to make them more visually separate and easier to find. Improve the markup, and make a couple of small wording adjustments. This also fixes one stray reference to the now-deprecated point operators <^ and >^. Dagfinn Ilmari Mannsåker, reviewed by David Johnston and Jürgen Purtz Discussion:

  • In psql's \d commands, don't truncate attribute default values. Historically, psql has truncated the text of a column's default expression at 128 characters. This is unlike any other behavior in describe.c, and it's become particularly confusing now that the limit is only applied to the expression proper and not to the "generated always as (...) stored" text that may get wrapped around it. Excavation in our git history suggests that the original motivation for this limit was not really to limit the display width (as I'd long supposed), but to make it safe to use a fixed-width output buffer to store the result. That implementation restriction is long gone of course, but the limit remained. Let's just get rid of it. While here, rearrange the logic about when to free the output string so that it's not so dependent on unstated assumptions about the possible values of attidentity and attgenerated. Per bug #16743 from David Turon. Back-patch to v12 where GENERATED came in. (Arguably we could take it back further, but I'm hesitant to change the behavior of long-stable branches for this.) Discussion:

  • Fix a recently-introduced race condition in LISTEN/NOTIFY handling. Commit 566372b3d fixed some race conditions involving concurrent SimpleLruTruncate calls, but it introduced new ones in async.c. A newly-listening backend could attempt to read Notify SLRU pages that were in process of being truncated, possibly causing an error. Also, the QUEUE_TAIL pointer could become set to a value that's not equal to the queue position of any backend. While that's fairly harmless in v13 and up (thanks to commit 51004c717), in older branches it resulted in near-permanent disabling of the queue truncation logic, so that continued use of NOTIFY led to queue-fill warnings and eventual inability to send any more notifies. (A server restart is enough to make that go away, but it's still pretty unpleasant.) The core of the problem is confusion about whether QUEUE_TAIL represents the "logical" tail of the queue (i.e., the oldest still-interesting data) or the "physical" tail (the oldest data we've not yet truncated away). To fix, split that into two variables. QUEUE_TAIL regains its definition as the logical tail, and we introduce a new variable to track the oldest un-truncated page. Per report from Mikael Gustavsson. Like the previous patch, back-patch to all supported branches. Discussion:

  • Clean up after tests in src/test/locale/. Oversight in 257836a75, which added these tests.

  • Doc: clarify behavior of PQconnectdbParams(). The documentation omitted the critical tidbit that a keyword-array entry is simply ignored if its corresponding value-array entry is NULL or an empty string; it will not override any previously-obtained value for the parameter. (See conninfo_array_parse().) I'd supposed that would force the setting back to default, which is what led me into bug #16746; but it doesn't. While here, I couldn't resist the temptation to do some copy-editing, both in the description of PQconnectdbParams() and in the section about connection URI syntax. Discussion:

Heikki Linnakangas pushed:

  • Split copy.c into four files. Copy.c has grown really large. Split it into more manageable parts: - copy.c now contains only a few functions that are common to COPY FROM and COPY TO. - copyto.c contains code for COPY TO. - copyfrom.c contains code for initializing COPY FROM, and inserting the tuples to the correct table. - copyfromparse.c contains code for reading from the client/file/program, and parsing the input text/CSV/binary format into tuples. All of these parts are fairly complicated, and fairly independent of each other. There is a patch being discussed to implement parallel COPY FROM, which will add a lot of new code to the COPY FROM path, and another patch which would allow INSERTs to use the same multi-insert machinery as COPY FROM, both of which will require refactoring that code. With those two patches, there's going to be a lot of code churn in copy.c anyway, so now seems like a good time to do this refactoring. The CopyStateData struct is also split. All the formatting options, like FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which is used by both COPY FROM and TO. Other state data are kept in separate CopyFromStateData and CopyToStateData structs. Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund Discussion:

  • Fix a few comments that referred to copy.c. Missed these in the previous commit.

  • Move per-agg and per-trans duplicate finding to the planner. This has the advantage that the cost estimates for aggregates can count the number of calls to transition and final functions correctly. Bump catalog version, because views can contain Aggrefs. Reviewed-by: Andres Freund Discussion:

  • Fix expected output: the order of agg permission checks changed. Commit 0a2bc5d61e changed the order that permissions on the final and transition functions of an aggregate are checked in. That shows up as a difference in the order the LOG messages in this sepgsql regression test are printed. Adjust the expected output. Per buildfarm failure in rhinoceros.

Álvaro Herrera pushed:

  • Make some sanity-check elogs more verbose. A few sanity checks in funcapi.c were not mentioning all the possible clauses for failure, confusing developers who fat-fingered catalog data additions. Make the errors more detailed to avoid wasting time in pinpointing mistakes. Per complaint from Craig Ringer. Reviewed-by: Tom Lane Discussion:

  • Don't hold ProcArrayLock longer than needed in rare cases. While cancelling an autovacuum worker, we hold ProcArrayLock while formatting a debugging log string. We can make this shorter by saving the data we need to produce the message and doing the formatting outside the locked region. This isn't terribly critical, as it only occurs pretty rarely: when a backend runs deadlock detection and it happens to be blocked by a autovacuum running autovacuum. Still, there's no need to cause a hiccup in ProcArrayLock processing, which can be very high-traffic in some cases. While at it, rework code so that we only print the string when it is really going to be used, as suggested by Michael Paquier. Discussion: Reviewed-by: Michael Paquier

  • Avoid spurious waits in concurrent indexing. In the various waiting phases of CREATE INDEX CONCURRENTLY (CIC) and REINDEX CONCURRENTLY (RC), we wait for other processes to release their snapshots; this is necessary in general for correctness. However, processes doing CIC in other tables cannot possibly affect CIC or RC done in "this" table, so we don't need to wait for those. This commit adds a flag in MyProc->statusFlags to indicate that the current process is doing CIC, so that other processes doing CIC or RC can ignore it when waiting. Note that this logic is only valid if the index does not access other tables. For simplicity we avoid setting the flag if the index has a column that's an expression, or has a WHERE predicate. (It is possible to have expressional or partial indexes that do not access other tables, but figuring that out would require more work.) This flag can potentially also be used by processes doing REINDEX CONCURRENTLY to be skipped; and by VACUUM to ignore processes in CIC or RC for the purposes of computing an Xmin. That's left for future commits. Author: Álvaro Herrera Author: Dimitry Dolgov Reviewed-by: Michael Paquier Discussion:

  • Restore lock level to update statusFlags. Reverts 27838981be9d (some comments are kept). Per discussion, it does not seem safe to relax the lock level used for this; in order for it to be safe, there would have to be memory barriers between the point we set the flag and the point we set the trasaction Xid, which perhaps would not be so bad; but there would also have to be barriers at the readers' side, which from a performance perspective might be bad. Now maybe this analysis is wrong and it is safe for some reason, but proof of that is not trivial. Discussion:

David Rowley pushed:

Michaël Paquier pushed:

  • Use macros instead of hardcoded offsets for LWLock initialization. This makes the code slightly easier to follow, as the initialization relies on an offset that overlapped with an equivalent set of macros defined, which are used in other places already. Author: Japin Li Discussion:

  • Remove catalog function currtid(). currtid() and currtid2() are an undocumented set of functions whose sole known user is the Postgres ODBC driver, able to retrieve the latest TID version for a tuple given by the caller of those functions. As used by Postgres ODBC, currtid() is a shortcut able to retrieve the last TID loaded into a backend by passing an OID of 0 (magic value) after a tuple insertion. This is removed in this commit, as it became obsolete after the driver began using "RETURNING ctid" with inserts, a clause supported since Postgres 8.2 (using RETURNING is better for performance anyway as it reduces the number of round-trips to the backend). currtid2() is still used by the driver, so this remains around for now. Note that this function is kept in its original shape for backward compatibility reasons. Per discussion with many people, including Andres Freund, Peter Eisentraut, Álvaro Herrera, Hiroshi Inoue, Tom Lane and myself. Bump catalog version. Discussion:

Fujii Masao pushed:

  • doc: Get rid of unnecessary space character from some index items. Previously some index items have " ," (i.e., space + comma) in the docs as follows. Since the space character before the comma is unnecessary, this commit gets rid of that for the sake of consistency with other index items. parallel_leader_participation configuration parameter , Other Planner Options Author: Fujii Masao Reviewed-by: Euler Taveira Discussion:

  • doc: Add description about re-analysis and re-planning of a prepared statement. A prepared statement is re-analyzed and re-planned whenever database objects used in the statement have undergone definitional changes or the planner statistics of them have been updated. The former has been documented from before, but the latter was not previously. This commit adds the description about the latter case into the docs. Author: Atsushi Torikoshi Reviewed-by: Andy Fan, Fujii Masao Discussion:

  • pg_stat_statements: Track number of times pgss entries were deallocated. If more distinct statements than pg_stat_statements.max are observed, pg_stat_statements entries about the least-executed statements are deallocated. This commit enables us to track the total number of times those entries were deallocated. That number can be viewed in the pg_stat_statements_info view that this commit adds. It's useful when tuning pg_stat_statements.max parameter. If it's high, i.e., the entries are deallocated very frequently, which might cause the performance regression and we can increase pg_stat_statements.max to avoid those frequent deallocations. The pg_stat_statements_info view is intended to display the statistics of pg_stat_statements module itself. Currently it has only one column "dealloc" indicating the number of times entries were deallocated. But an upcoming patch will add other columns (for example, the time at which pg_stat_statements statistics were last reset) into the view. Author: Katsuragi Yuta, Yuki Seino Reviewed-by: Fujii Masao Discussion:

  • Use standard SIGHUP and SIGTERM signal handlers in worker_spi. Previously worker_spi used its custom signal handlers for SIGHUP and SIGTERM. This commit makes worker_spi use the standard signal handlers, to simplify the code. Note that die() is used as the standard SIGTERM signal handler in worker_spi instead of SignalHandlerForShutdownRequest() or bgworker_die(). Previously the exit handling was only able to exit from within the main loop, and not from within the backend code it calls. This is why die() needs to be used here, so worker_spi can respond to SIGTERM signal while it's executing a query. Maybe we can say that it's a bug that worker_spi could not respond to SIGTERM during query execution. But since worker_spi is a just example of the background worker code, we don't do the back-patch. Thanks to Craig Ringer for the report and investigation of the issue. Author: Bharath Rupireddy Reviewed-by: Fujii Masao Discussion: Discussion:

  • Use standard SIGTERM signal handler die() in test_shm_mq worker. Previously test_shm_mq worker used the stripped-down version of die() as the SIGTERM signal handler. This commit makes it use die(), instead, to simplify the code. In terms of the code, the difference between die() and the stripped-down version previously used is whether the signal handler directly may call ProcessInterrupts() or not. But this difference doesn't exist in a background worker because, in bgworker, DoingCommandRead flag will never be true and die() will never call ProcessInterrupts() directly. Therefore test_shm_mq worker can safely use die(), like other bgworker proceses (e.g., logical replication apply launcher or autoprewarm worker) currently do. Thanks to Craig Ringer for the report and investigation of the issue. Author: Bharath Rupireddy Reviewed-by: Fujii Masao Discussion:

  • Fix CLUSTER progress reporting of number of blocks scanned. Previously pg_stat_progress_cluster view reported the current block number in heap scan as the number of heap blocks scanned (i.e., heap_blks_scanned). This reported number could be incorrect when synchronize_seqscans is enabled, because it allowed the heap scan to start at block in middle. This could result in wraparounds in the heap_blks_scanned column when the heap scan wrapped around. This commit fixes the bug by calculating the number of blocks from the block that the heap scan starts at to the current block in scan, and reporting that number in the heap_blks_scanned column. Also, in pg_stat_progress_cluster view, previously heap_blks_scanned could not reach heap_blks_total at the end of heap scan phase if the last pages scanned were empty. This commit fixes the bug by manually updating heap_blks_scanned to the same value as heap_blks_total when the heap scan phase finishes. Back-patch to v12 where pg_stat_progress_cluster view was introduced. Reported-by: Matthias van de Meent Author: Matthias van de Meent Reviewed-by: Fujii Masao Discussion:

Andrew Gierth pushed:

  • Properly check index mark/restore in ExecSupportsMarkRestore. Previously this code assumed that all IndexScan nodes supported mark/restore, which is not true since it depends on optional index AM support functions. This could lead to errors about missing support functions in rare edge cases of mergejoins with no sort keys, where an unordered non-btree index scan was placed on the inner path without a protecting Materialize node. (Normally, the fact that merge join requires ordered input would avoid this error.) Backpatch all the way since this bug is ancient. Per report from Eugen Konkov on irc. Discussion:

Amit Kapila pushed:

Thomas Munro pushed:

Peter Eisentraut pushed:

Noah Misch pushed:

Pending Patches

Amul Sul sent in another revision of a patch to implement ALTER SYSTEM READ {ONLY|WRITE}.

Daniel Vérité sent in another revision of a patch to implement batch/pipelining in libpq.

Justin Pryzby and Tomáš Vondra traded patches to implement extended statistics on expressions.

Álvaro Herrera sent in another revision of a patch to Avoid errors in brin summarization, which can happen if an index is reindexed concurrently.

Álvaro Herrera sent in a patch to fix a bug that manifested as a walsender getting stuck during shutdown and not shut down, thus preventing postmaster from completing the shutdown cycle by checking whether XLogRecPtrIsInvalid(replicatedPtr) was true.

Zeng Wenjing sent in three more revisions of a patch to implement global temporary tables.

Bharath Rupireddy sent in two more revisions of a patch to implement postgres_fdw connection caching - cause remote sessions linger till the local session exit.

Bharath Rupireddy and Heikki Linnakangas traded patches to make it possible to use parallel inserts in CREATE TABLE AS, where it's safe to do so.

Tomáš Vondra sent in another revision of a patch to use non-volatile storage as a WAL buffer.

Takayuki Tsunakawa sent in two more revisions of a patch to add bulk inserts for foreign tables.

Justin Pryzby sent in another revision of a patch to allow INSERT SELECT to use a BulkInsertState, make INSERT SELECT use multi_insert, and dynamically switch to multi-insert mode.

Michaël Paquier sent in another revision of a patch to rework the SHA2 and crypto hash APIs, switch cryptohash_openssl.c to use EVP, and make pgcrypto use the in-core resowner facility for EVP.

Justin Pryzby sent in another revision of a patch to allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly.

Keisuke Kuroda sent in a patch to fix a bug that manifested as huge memory consumption on partitioned table with FKs by reducing the size of the ri SPI plan hash.

Konstantin Knizhnik sent in two more revisions of a patch to implement custom compression for libpq.

Takamichi Osumi sent in another revision of a patch to make it possible to disable WAL logging to speed up bulk loads.

Nathan Bossart sent in two revisions of a patch to add a FAST option to CHECKPOINT.

Amit Kapila, Ajin Cherian, and Peter Smith traded patches to implement logical decoding for two-phase transactions.

David Rowley sent in a patch to define pg_attribute_cold and pg_attribute_hot to be empty macros on minGW 8.1 so as to avoid a bug in that toolchain.

Li Japin sent in another revision of a patch to allow terminating the idle sessions via a new GUC, idle_session_timeout, and call setitimer() less often.

Euler Taveira de Oliveira sent in a patch to add logical decoding messages to pgoutput, add xid to messages when streaming, explain why LOGICAL_REP_MSG_MESSAGE is ignored, simplify the parse_output_parameters function to take a whole PGOutputData instead of bits and pieces, adjust in_streaming for messages, and overhaul the tests to take account for all this.

Peter Eisentraut sent in another revision of a patch to add a result_format_auto_binary_types setting.

Michaël Paquier sent in another revision of a patch to fix a problem that manifested as vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.

Kyotaro HORIGUCHI sent in another revision of a patch to fix handling of NaN in the geometry types.

Masahiko Sawada sent in another revision of a patch to enable two-phase commit for multiple foreign servers.

Daniel Gustafsson sent in another revision of a patch to make it possible to enable and disable data checksums online.

Alexander Korotkov sent in a patch to implement a built-in infrastructure for reproduction of concurrency issues in automated test suites. Central to this infrastructure are "stop events," which which are special places in the code, where the execution could be stopped on some condition.

Peter Eisentraut sent in another revision of a patch to implement SEARCH and CYCLE clauses in common table expressions per the SQL standard.

Thomas Munro sent in another revision of a patch to get latches to send fewer signals, use SIGURG rather than SIGUSR1 for latches, use signalfd for epoll latches, and use EVFILT_SIGNAL for kqueue latches.

Tom Lane sent in another revision of a patch to report GUC changes at query end.

Peter Smith sent in a patch to use enums for message types.

David Zhang sent in three revisions of a patch to add table access method as an option to pgbench.

Anastasia Lubennikova sent in a patch to handle negative number of tuples passed to normal_rand().

Peter Eisentraut sent in a patch to pageinspect to change the block number arguments to bigint.

Bertrand Drouvot sent in four more revisions of a patch to make it possible to log the standby recovery conflict waits via a new GUC, log_recovery_conflict_waits.

Kasahara Tatsuhito sent in three more revisions of a patch to fix a bug that manifested as an autovacuum issue with large numbers of tables.

Masahiko Sawada sent in another revision of a patch to add basic statistics to the pg_stat_wal view.

Takamichi Osumi sent in a patch to prevent a scenario that archive recovery hits WALs which come from wal_level=minimal and the server continues to work, which condition could cause data not to be replicated.

Euler Taveira de Oliveira sent in a patch to remove temporary files after a backend crash in order to avert ENOSPC conditions that could result from multiple crashes.

Pavel Borisov sent in another revision of a patch to implement covering indexes using the SP-GiST index access method.

Kirk Jamison sent in another revision of a patch to prevent invalidating blocks in smgrextend() during recovery, add a bool parameter in smgrnblocks() for cached blocks, slim down DropRelFileNodeBuffers() during recovery by avoiding scanning the whole buffer pool when the relation is small enough or the the total number of blocks to be invalidated is below the threshold of full scanning, and getting DropRelFileNodesAllBuffers() to skip the time-consuming scan of the whole buffer pool during recovery when the relation is small enough, or when the number of blocks to be invalidated is below the full scan threshold.

Krunal Bauskar and Alexander Korotkov traded patches to improve the spinlock implementation on ARM.

Arne Roland sent in three revisions of a patch to ensure that renaming a trigger on a partitioned table also renames triggers on the partitions.

Bharath Rupireddy sent in a patch to fix the error message for pg_workers shutting down so it talks about background workers instead of the non-existent connections that apply to other cases.

Stephen Frost sent in another revision of a patch to add GSS information to the connection authorized log message, if needed.

Michael Banck sent in a patch to clarify the fact that CREATEROLE roles can GRANT default roles.

Ashutosh Bapat and Alexander Korotkov traded patches to make it easy to print LSNs.

Andreas Karlsson sent in a PoC patch to fix the fact that the inet/cidr support shipped broken by throwing away netmask information in the btree_gist supplied extension.

Pavel Stěhule and Justin Pryzby traded patches to make it possible to read the tables to be dumped by pg_dump from a file.

Justin Pryzby sent in another revision of a patch to make CLUSTER ON a separate dump object in pg_dump, implement CLUSTER for partitioned tables, propagate changes to indisclustered to child/parents, invalidate parent indexes, invalidate parent index cluster on attach, and preserve indisclustered on children of clustered, partitioned indexes.

Simon Riggs sent in another revision of a patch to add a FAST_FREEZE option to VACUUM.

Simon Riggs sent in a patch to implement one_freeze then max_freeze for lazy VACUUM.

Justin Pryzby sent in another revision of a patch to make pg_ls_* show directories and shared filesets.

Justin Pryzby sent in another revision of a patch to remove references to pg_dump's pre-8.1 switch behaviour.

Justin Pryzby sent in another revision of a patch to allow CREATE INDEX CONCURRENTLY on partitioned tables, add a SKIPVALID flag for more integration, and make ReindexPartitions() set indisvalid.

Paul A Jungwirth sent in two more revisions of a patch to implement multiranges.

Dean Rasheed sent in another revision of a patch to improve estimation of OR clauses.

James Coleman sent in a patch to error if gather merge paths aren't sufficiently sorted.

James Coleman sent in another revision of a patch to ensure that generate_useful_gather_paths doesn't skip unsorted subpaths, enforce parallel safety of pathkeys in generate_useful_gather_paths, disallow SRFs in proactive sort, remove the volatile expr target search on the grounds that it's not needed then, and document find_em_expr_usable_for_sorting_rel in prepare_sort_from_pathkeys.