PostgreSQL Weekly News - November 7, 2021

Posted on 2021-11-08 by PWN

PostgreSQL Weekly News - November 7, 2021

PG Build 2021 will be held online on 30 November and 1 December 2021 09:00-17:00 GMT. Details.

PostgreSQL Product News

PostgresDAC 3.11, a direct access component suite for PostgreSQL, released.

JDBC 42.3.1 released.

ODB C++ ORM version 2.5.0-b.21 released.

DynamoDB FDW 1.0.0 released.

Babelfish, a MS SQL Server compatibility layer for PostgreSQL, released.

PostgreSQL Jobs for November

PostgreSQL in the News

Planet PostgreSQL:

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Tom Lane pushed:

Michaël Paquier pushed:

  • Preserve opclass parameters across REINDEX CONCURRENTLY. The opclass parameter Datums from the old index are fetched in the same way as for predicates and expressions, by grabbing them directly from the system catalogs. They are then copied into the new IndexInfo that will be used for the creation of the new copy. This caused the new index to be rebuilt with default parameters rather than the ones pre-defined by a user. The only way to get back a new index with correct opclass parameters would be to recreate a new index from scratch. The issue has been introduced by 911e702. Author: Michael Paquier Reviewed-by: Zhihong Yu Discussion: Backpatch-through: 13

  • Add TAP test for pg_receivewal with timeline switch. pg_receivewal is able to follow a timeline switch, but this was not tested. This test uses an empty archive location with a restart done from a slot, making its implementation a tad simpler than if we would reuse an existing archive directory. Author: Ronan Dunklau Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion:

  • Rework compression options of pg_receivewal. pg_receivewal includes since cada1af the option --compress, to allow the compression of WAL segments using gzip, with a value of 0 (the default) meaning that no compression can be used. This commit introduces a new option, called --compression-method, able to use as values "none", the default, and "gzip", to make things more extensible. The case of --compress=0 becomes fuzzy with this option layer, so we have made the choice to make pg_receivewal return an error when using "none" and a non-zero compression level, meaning that the authorized values of --compress are now [1,9] instead of [0,9]. Not specifying --compress with "gzip" as compression method makes pg_receivewal use the default of zlib instead (Z_DEFAULT_COMPRESSION). The code in charge of finding the streaming start LSN when scanning the existing archives is refactored and made more extensible. While on it, rename "compression" to "compression_level" in walmethods.c, to reduce the confusion with the introduction of the compression method, even if the tar method used by pg_basebackup does not rely on the compression method (yet, at least), but just on the compression level (this area could be improved more, actually). This is in preparation for an upcoming patch that adds LZ4 support to pg_receivewal. Author: Georgios Kokolatos Reviewed-by: Michael Paquier, Jian Guo, Magnus Hagander, Dilip Kumar, Robert Haas Discussion:

  • Fix some thinkos with pg_receivewal --compression-method. The option name was incorrect in one of the error messages, and the short option 'I' was used in the code but we did not intend things to be this way. While on it, fix the documentation to refer to a "method", and not a "level. Oversights in commit d62bcc8, that I have detected after more review of the LZ4 patch for pg_receivewal.

  • Add support for LZ4 compression in pg_receivewal. pg_receivewal gains a new option, --compression-method=lz4, available when the code is compiled with --with-lz4. Similarly to gzip, this gives the possibility to compress archived WAL segments with LZ4. This option is not compatible with --compress. The implementation uses LZ4 frames, and is compatible with simple lz4 commands. Like gzip, using --synchronous ensures that any data will be flushed to disk within the current .partial segment, so as it is possible to retrieve as much WAL data as possible even from a non-completed segment (this requires completing the partial file with zeros up to the WAL segment size supported by the backend after decompression, but this is the same as gzip). The calculation of the streaming start LSN is able to transparently find and check LZ4-compressed segments. Contrary to gzip where the uncompressed size is directly stored in the object read, the LZ4 chunk protocol does not store the uncompressed data by default. There is contentSize that can be used with LZ4 frames by that would not help if using an archive that includes segments compressed with the defaults of a "lz4" command, where this is not stored. So, this commit has taken the most extensible approach by decompressing the already-archived segment to check its uncompressed size, through a blank output buffer in chunks of 64kB (no actual performance difference noticed with 8kB, 16kB or 32kB, and the operation in itself is actually fast). Tests have been added to verify the creation and correctness of the generated LZ4 files. The latter is achieved by the use of command "lz4", if found in the environment. The tar-based WAL method in walmethods.c, used now only by pg_basebackup, does not know yet about LZ4. Its code could be extended for this purpose. Author: Georgios Kokolatos Reviewed-by: Michael Paquier, Jian Guo, Magnus Hagander, Dilip Kumar Discussion:

  • Improve psql tab completion for COMMENT. Completion is added for more object types, like domain constraints, text search-ish objects or policies. Moreover, the area is reorganized, changing the list of objects supported by COMMENT to be in the same order as the documentation to ease future additions. Author: Ken Kato Reviewed-by: Fujii Masao, Shinya Kato, Suraj Khamkar, Michael Paquier Discussion:

Álvaro Herrera pushed:

Daniel Gustafsson pushed:

Amit Kapila pushed:

Fujii Masao pushed:

Peter Geoghegan pushed:

  • Don't overlook indexes during parallel VACUUM. Commit b4af70cb, which simplified state managed by VACUUM, performed refactoring of parallel VACUUM in passing. Confusion about the exact details of the tasks that the leader process is responsible for led to code that made it possible for parallel VACUUM to miss a subset of the table's indexes entirely. Specifically, indexes that fell under the min_parallel_index_scan_size size cutoff were missed. These indexes are supposed to be vacuumed by the leader (alongside any parallel unsafe indexes), but weren't vacuumed at all. Affected indexes could easily end up with duplicate heap TIDs, once heap TIDs were recycled for new heap tuples. This had generic symptoms that might be seen with almost any index corruption involving structural inconsistencies between an index and its table. To fix, make sure that the parallel VACUUM leader process performs any required index vacuuming for indexes that happen to be below the size cutoff. Also document the design of parallel VACUUM with these below-size-cutoff indexes. It's unclear how many users might be affected by this bug. There had to be at least three indexes on the table to hit the bug: a smaller index, plus at least two additional indexes that themselves exceed the size cutoff. Cases with just one additional index would not run into trouble, since the parallel VACUUM cost model requires two larger-than-cutoff indexes on the table to apply any parallel processing. Note also that autovacuum was not affected, since it never uses parallel processing. Test case based on tests from a larger patch to test parallel VACUUM by Masahiko Sawada. Many thanks to Kamigishi Rei for her invaluable help with tracking this problem down. Author: Peter Geoghegan Author: Masahiko Sawada Reported-By: Kamigishi Rei Reported-By: Andrew Gierth Diagnosed-By: Andres Freund Bug: #17245 Discussion: Discussion: Backpatch: 14-, where the refactoring commit appears.

  • Fix parallel amvacuumcleanup safety bug. Commit b4af70cb inverted the return value of the function parallel_processing_is_safe(), but missed the amvacuumcleanup test. Index AMs that don't support parallel cleanup at all were affected. The practical consequences of this bug were not very serious. Hash indexes are affected, but since they just return the number of blocks during hashvacuumcleanup anyway, it can't have had much impact. Author: Masahiko Sawada Discussion: Backpatch: 14-, where commit b4af70cb appears.

  • Add another old commit to git-blame-ignore-revs. Add another historic pgindent commit that was missed by the initial work done in commit 8e638845.

  • Add various assertions to heap pruning code. These assertions document (and verify) our high level assumptions about how pruning can and cannot affect existing items from target heap pages. For example, one of the new assertions verifies that pruning does not set a heap-only tuple to LP_DEAD. Author: Peter Geoghegan Reviewed-By: Andres Freund Discussion:

  • Add hardening to catch invalid TIDs in indexes. Add hardening to the heapam index tuple deletion path to catch TIDs in index pages that point to a heap item that index tuples should never point to. The corruption we're trying to catch here is particularly tricky to detect, since it typically involves "extra" (corrupt) index tuples, as opposed to the absence of required index tuples in the index. For example, a heap TID from an index page that turns out to point to an LP_UNUSED item in the heap page has a good chance of being caught by one of the new checks. There is a decent chance that the recently fixed parallel VACUUM bug (see commit 9bacec15) would have been caught had that particular check been in place for Postgres 14. No backpatch of this extra hardening for now, though. Author: Peter Geoghegan Reviewed-By: Andres Freund Discussion:

  • Update obsolete heap pruning comments. Add new comments that spell out what VACUUM expects from heap pruning: pruning must never leave behind DEAD tuples that still have tuple storage. This has at least been the case since commit 8523492d, which established the principle that vacuumlazy.c doesn't have to deal with DEAD tuples that still have tuple storage directly, except perhaps by simply retrying pruning (to handle a rare corner case involving concurrent transaction abort). In passing, update some references to old symbol names that were missed by the snapshot scalability work (specifically commit dc7420c2c9).

  • Update obsolete reference in vacuumlazy.c. Oversight in commit 7ab96cf6.

Peter Eisentraut pushed:

Heikki Linnakangas pushed:

Robert Haas pushed:

  • amcheck: Add additional TOAST pointer checks. Expand the checks of toasted attributes to complain if the rawsize is overlarge. For compressed attributes, also complain if compression appears to have expanded the attribute or if the compression method is invalid. Mark Dilger, reviewed by Justin Pryzby, Alexander Alekseev, Heikki Linnakangas, Greg Stark, and me. Discussion:

  • Introduce 'bbsink' abstraction to modularize base backup code. The base backup code has accumulated a healthy number of new features over the years, but it's becoming increasingly difficult to maintain and further enhance that code because there's no real separation of concerns. For example, the code that understands knows the details of how we send data to the client using the libpq protocol is scattered throughout basebackup.c, rather than being centralized in one place. To try to improve this situation, introduce a new 'bbsink' object which acts as a recipient for archives generated during the base backup progress and also for the backup manifest. This commit introduces three types of bbsink: a 'copytblspc' bbsink forwards the backup to the client using one COPY OUT operation per tablespace and another for the manifest, a 'progress' bbsink performs command progress reporting, and a 'throttle' bbsink performs rate-limiting. The 'progress' and 'throttle' bbsink types also forward the data to a successor bbsink; at present, the last bbsink in the chain will always be of type 'copytblspc'. There are plans to add more types of 'bbsink' in future commits. This abstraction is a bit leaky in the case of progress reporting, but this still seems cleaner than what we had before. Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, and Jeevan Ladhe. Discussion: Discussion:

  • Introduce 'bbstreamer' abstraction to modularize pg_basebackup. pg_basebackup knows how to do quite a few things with a backup that it gets from the server, like just write out the files, or compress them first, or even parse the tar format and inject a modified file into the archive generated by the server. Unforatunely, this makes pg_basebackup.c a very large source file, and also somewhat difficult to enhance, because for example the knowledge that the server is sending us a 'tar' file rather than some other sort of archive is spread all over the place rather than centralized. In an effort to improve this situation, this commit invents a new 'bbstreamer' abstraction. Each archive received from the server is fed to a bbstreamer which may choose to dispose of it or pass it along to some other bbstreamer. Chunks may also be "labelled" according to whether they are part of the payload data of a file in the archive or part of the archive metadata. So, for example, if we want to take a tar file, modify the file it contains, and the gzip the result and write it out, we can use a bbstreamer_tar_parser to parse the tar file received from the server, a bbstreamer_recovery_injector to modify the contents of, a bbstreamer_tar_archiver to replace the tar headers for the file modified in the previous step with newly-built ones that are correct for the modified file, and a bbstreamer_gzip_writer to gzip and write the resulting data. Only the objects with "tar" in the name know anything about the tar archive format, and in theory we could re-archive using some other format rather than "tar" if somebody wanted to write the code. These chances do add a substantial amount of code, but I think the result is a lot more maintainable and extensible. pg_basebackup.c itself shrinks by roughly a third, with a lot of the complexity previously contained there moving into the newly-added files. Patch by me. The larger patch series of which this is a part has been reviewed and tested at various times by Andres Freund, Sumanta Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, Sergei Kornilov, and Jeevan Ladhe. Discussion: Discussion:

  • Don't set ThisTimeLineID when there's no reason to do so. In slotfuncs.c, pg_replication_slot_advance() needs to determine the LSN up to which the slot should be advanced, but that doesn't require us to update ThisTimeLineID, because none of the code called from here depends on it. If the replication slot is logical, pg_logical_replication_slot_advance will call read_local_xlog_page, which does use ThisTimeLineID, but also takes care of making sure it's up to date. If the replication slot is physical, the timeline isn't used for anything at all. In logicalfuncs.c, pg_logical_slot_get_changes_guts() has the same issue: the only code we're going to run that cares about timelines is in or downstream of read_local_xlog_page, which already makes sure that the correct value gets set. Hence, don't do it here. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion:

  • Remove all use of ThisTimeLineID global variable outside of xlog.c. All such code deals with this global variable in one of three ways. Sometimes the same functions use it in more than one of these ways at the same time. First, sometimes it's an implicit argument to one or more functions being called in xlog.c or elsewhere, and must be set to the appropriate value before calling those functions lest they misbehave. In those cases, it is now passed as an explicit argument instead. Second, sometimes it's used to obtain the current timeline after the end of recovery, i.e. the timeline to which WAL is being written and flushed. Such code now calls GetWALInsertionTimeLine() or relies on the new out parameter added to GetFlushRecPtr(). Third, sometimes it's used during recovery to store the current replay timeline. That can change, so such code must generally update the value before each use. It can still do that, but must now use a local variable instead. The net effect of these changes is to reduce by a fair amount the amount of code that is directly accessing this global variable. That's good, because history has shown that we don't always think clearly about which timeline ID it's supposed to contain at any given point in time, or indeed, whether it has been or needs to be initialized at any given point in the code. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion:

  • Change ThisTimeLineID from a global variable to a local variable. StartupXLOG() still has ThisTimeLineID as a local variable, but the remaining code in xlog.c now needs to the relevant TimeLineID by some other means. Mostly, this means that we now pass it as a function parameter to a bunch of functions where we didn't previously. However, a few cases require special handling: - In functions that might be called by outside callers who wouldn't necessarily know what timeline to specify, we get the timeline ID from shared memory. XLogCtl->ThisTimeLineID can be used in most cases since recovery is known to have completed by the time those functions are called. In xlog_redo(), we can use XLogCtl->replayEndTLI. - XLogFileClose() needs to know the TLI of the open logfile. Do that with a new global variable openLogTLI. While someone could argue that this is just trading one global variable for another, the new one has a far more narrow purposes and is referenced in just a few places. - read_backup_label() now returns the TLI that it obtains by parsing the backup_label file. Previously, ReadRecord() could be called to parse the checkpoint record without ThisTimeLineID having been initialized. Now, the timeline is passed down, and I didn't want to pass an uninitialized variable; this change lets us avoid that. The old coding didn't seem to have any practical consequences that we need to worry about, but this is cleaner. - In BootstrapXLOG(), it's just a constant. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion:

  • Remove tests added by bd807be6935929bdefe74d1258ca08048f0aafa3. The buildfarm is unhappy. It's not obvious why it doesn't like these tests, but let's remove them until we figure it out. Discussion:

Tomáš Vondra pushed:

Alexander Korotkov pushed:

Andres Freund pushed: