PostgreSQL Weekly News - November 7, 2021

From: PWN via PostgreSQL Announce <announce-noreply(at)postgresql(dot)org>
To: PostgreSQL Announce <pgsql-announce(at)lists(dot)postgresql(dot)org>
Subject: PostgreSQL Weekly News - November 7, 2021
Date: 2021-11-08 08:24:59
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-announce

# PostgreSQL Weekly News - November 7, 2021

PG Build 2021 will be held online on 30 November and 1 December 2021 09:00-17:00 GMT.

# PostgreSQL Product News

PostgresDAC 3.11, a direct access component suite for PostgreSQL, released.

JDBC 42.3.1

ODB C++ ORM version 2.5.0-b.21 [released](

DynamoDB FDW 1.0.0 [released](

Babelfish, a MS SQL Server compatibility layer for PostgreSQL,

# PostgreSQL Jobs for November


# PostgreSQL in the News

Planet PostgreSQL: [](

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to david(at)fetter(dot)org(dot)

# Applied Patches

Tom Lane pushed:

- plpgsql: report proper line number for errors in variable initialization.
Previously, we pointed at the surrounding block's BEGIN keyword. If there are
multiple variables being initialized in a DECLARE section, this isn't good
enough: it can be quite confusing and unhelpful. We do know where the
variable's declaration started, so it just takes a tiny bit more
error-reporting infrastructure to use that. Discussion:

- Avoid O(N^2) behavior when the standby process releases many locks. When
replaying a transaction that held many exclusive locks on the primary, a
standby server's startup process would expend O(N^2) effort on manipulating
the list of locks. This code was fine when written, but commit 1cff1b95a made
repetitive list_delete_first() calls inefficient, as explained in its commit
message. Fix by just iterating the list normally, and releasing storage only
when done. (This'd be inadequate if we needed to recover from an error
occurring partway through; but we don't.) Back-patch to v13 where 1cff1b95a
came in. Nathan Bossart Discussion:

- Doc: improve README files associated with TAP tests. Rearrange
src/test/perl/README so that the first section is more clearly "how to run
these tests", and the rest "how to write new tests". Add some basic info
there about debugging test failures. Then, add cross-refs to that READNE from
other READMEs that describe how to run TAP tests. Per suggestion from Kevin
Burke, though this is not his original patch. Discussion:

- Don't try to read a multi-GB pg_stat_statements file in one call. Windows
fails on a request to read() more than INT_MAX bytes, and perhaps other
platforms could have similar issues. Let's adjust this code to read at most
1GB per call. (One would not have thought the file could get that big, but
now we have a field report of trouble, so it can. We likely ought to add some
mechanism to limit the size of the query-texts file separately from the size
of the hash table. That is not this patch, though.) Per bug #17254 from
Yusuke Egashira. It's been like this for awhile, so back-patch to all
supported branches. Discussion:

- Avoid some other O(N^2) hazards in list manipulation. In the same spirit as
6301c3ada, fix some more places where we were using list_delete_first() in a
loop and thereby risking O(N^2) behavior. It's not clear that the lists
manipulated in these spots can get long enough to be really problematic ...
but it's not clear that they can't, either, and the fixes are simple enough.
As before, back-patch to v13. Discussion:

- Avoid O(N^2) behavior in SyncPostCheckpoint(). As in commits 6301c3ada and
e9d9ba2a4, avoid doing repetitive list_delete_first() operations, since that
would be expensive when there are many files waiting to be unlinked. This is
a slightly larger change than in those cases. We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry entries.
Also, because we might not be able to process all the entries, we need a new
list primitive list_delete_first_n(). list_delete_first_n() is almost
list_copy_tail(), but it modifies the input List instead of making a new copy.
I found a couple of existing uses of the latter that could profitably use the
new function. (There might be more, but the other callers look like they
probably shouldn't overwrite the input List.) As before, back-patch to v13.

- Doc: be more precise about conflicts between relation names. Use verbiage like
"The name of the table must be distinct from the name of any other relation
(table, sequence, index, view, materialized view, or foreign table) in the
same schema." in the reference pages for all those object types. The main
change here is to mention materialized views explicitly; although a couple of
these pages failed to say anything at all about name conflicts. Per
suggestion from Daniel Westermann. Discussion:

- Doc: clean up some places that mentioned template1 but not template0. Improve
old text that wasn't updated when we added template0 to the standard database
set. Per suggestion from P. Luzanov. Discussion:

- Fix variable lifespan in ExecInitCoerceToDomain(). This undoes a mistake in
1ec7679f1: domainval and domainnull were meant to live across loop iterations,
but they were incorrectly moved inside the loop. The effect was only to emit
useless extra EEOP_MAKE_READONLY steps, so it's not a big deal; nonetheless,
back-patch to v13 where the mistake was introduced. Ranier Vilela

- Ensure consistent logical replication of datetime and float8 values. In
walreceiver, set the publisher's relevant GUCs (datestyle, intervalstyle,
extra_float_digits) to the same values that pg_dump uses, and for the same
reason: we need the output to be read the same way regardless of the
receiver's settings. Without this, it's possible for subscribers to
misinterpret transmitted values. Although this is clearly a bug fix, it's not
without downsides: subscribers that are storing values into some other
datatype, such as text, could get different results than before, and perhaps
be unhappy about that. Given the lack of previous complaints, it seems best
to change this only in HEAD, and to call it out as an incompatible change in
v15. Japin Li, per report from Sadhuprasad Patro Discussion:

- Blind attempt to silence SSL compile failures on hamerkop. Buildfarm member
hamerkop has been failing for the last few days with errors that look like
OpenSSL's X509-related symbols have not been imported into
be-secure-openssl.c. It's unclear why this should be, but let's try adding an
explicit #include of <openssl/x509v3.h>, as there has long been in
fe-secure-openssl.c. Discussion:

- Un-break pg_basebackup's MSVC build. Commit 23a1c6578 thought it'd be cute to
refactor pg_basebackup/Makefile with a new variable BBOBJS, but our MSVC build
system knows nothing of that. Per buildfarm.

- Second attempt to silence SSL compile failures on hamerkop. After further
investigation, it seems the cause of the problem is our recent decision to
start defining WIN32_LEAN_AND_MEAN. That causes <windows.h> to no longer
include <wincrypt.h>, which means that the OpenSSL headers are unable to
prevent conflicts with that header by #undef'ing the conflicting macros.
Apparently, some other system header that be-secure-openssl.c #includes after
the OpenSSL headers is pulling in <wincrypt.h>. It's obscure just where that
happens and why we're not seeing it on other Windows buildfarm animals.
However, it should work to move the OpenSSL #includes to the end of the list.
For the sake of future-proofing, do likewise in fe-secure-openssl.c. In
passing, remove useless double inclusions of <openssl/ssl.h>. Thanks to
Thomas Munro for running down the relevant information. Discussion:

- Disallow making an empty lexeme via array_to_tsvector(). The tsvector data
type has always forbidden lexemes to be empty. However, array_to_tsvector()
didn't get that memo, and would allow an empty-string array element to become
an empty lexeme. This could result in dump/restore failures later, not to
mention whatever semantic issues might be behind the original prohibition.
However, other functions that take a plain text input directly as a lexeme
value do not need a similar restriction, because they only match the string
against existing tsvector entries. In particular it'd be a bad idea to make
ts_delete() reject empty strings, since that is the most convenient way to
clean up any bad data that might have gotten into a tsvector column via this
bug. Reflecting on that, let's also remove the prohibition against NULL array
elements in tsvector_delete_arr and tsvector_setweight_by_filter. It seems
more consistent to ignore them, as an empty-string element would be ignored.
There's a case for back-patching this, since it's clearly a bug fix. On
balance though, it doesn't seem like something to change in a minor release.
Jean-Christophe Arnu Discussion:

- Blind attempt to fix MSVC pgcrypto build. Commit db7d1a7b0 pulled out's custom support for building contrib/pgcrypto, but neglected to
inform it that that module can now be built normally. Or at least I guess it
can now be built normally. But this is definitely causing bowerbird to fail,
since it's trying to test a module it hasn't built.

- Doc: add some notes about performance of the List functions. Per suggestion
from Andres Freund. Discussion:

- contrib/sslinfo needs a fix too to make hamerkop happy. Re-ordering the
#include's is a bit problematic here because libpq/libpq-be.h needs to include
<openssl/ssl.h>. Instead, let's #undef the unwanted macro after all the
#includes. This is definitely uglier than the other way, but it should work
despite possible future header rearrangements. (A look at the openssl headers
indicates that X509_NAME is the only conflicting symbol that we use.) In
passing, remove a related but long-incorrect comment in pg_backup_archiver.h.

- Silence uninitialized-variable warning. Quite a few buildfarm animals are
warning about this, and lapwing is actually failing (because -Werror). It's a
false positive AFAICS, so no need to do more than zero the variable to start
with. Discussion:

Michaël Paquier pushed:

- Preserve opclass parameters across REINDEX CONCURRENTLY. The opclass parameter
Datums from the old index are fetched in the same way as for predicates and
expressions, by grabbing them directly from the system catalogs. They are
then copied into the new IndexInfo that will be used for the creation of the
new copy. This caused the new index to be rebuilt with default parameters
rather than the ones pre-defined by a user. The only way to get back a new
index with correct opclass parameters would be to recreate a new index from
scratch. The issue has been introduced by 911e702. Author: Michael Paquier
Reviewed-by: Zhihong Yu Discussion:
Backpatch-through: 13

- Add TAP test for pg_receivewal with timeline switch. pg_receivewal is able to
follow a timeline switch, but this was not tested. This test uses an empty
archive location with a restart done from a slot, making its implementation a
tad simpler than if we would reuse an existing archive directory. Author:
Ronan Dunklau Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion:

- Rework compression options of pg_receivewal. pg_receivewal includes since
cada1af the option --compress, to allow the compression of WAL segments using
gzip, with a value of 0 (the default) meaning that no compression can be used.
This commit introduces a new option, called --compression-method, able to use
as values "none", the default, and "gzip", to make things more extensible.
The case of --compress=0 becomes fuzzy with this option layer, so we have made
the choice to make pg_receivewal return an error when using "none" and a
non-zero compression level, meaning that the authorized values of --compress
are now [1,9] instead of [0,9]. Not specifying --compress with "gzip" as
compression method makes pg_receivewal use the default of zlib instead
(Z_DEFAULT_COMPRESSION). The code in charge of finding the streaming start
LSN when scanning the existing archives is refactored and made more
extensible. While on it, rename "compression" to "compression_level" in
walmethods.c, to reduce the confusion with the introduction of the compression
method, even if the tar method used by pg_basebackup does not rely on the
compression method (yet, at least), but just on the compression level (this
area could be improved more, actually). This is in preparation for an
upcoming patch that adds LZ4 support to pg_receivewal. Author: Georgios
Kokolatos Reviewed-by: Michael Paquier, Jian Guo, Magnus Hagander, Dilip
Kumar, Robert Haas Discussion:

- Fix some thinkos with pg_receivewal --compression-method. The option name was
incorrect in one of the error messages, and the short option 'I' was used in
the code but we did not intend things to be this way. While on it, fix the
documentation to refer to a "method", and not a "level. Oversights in commit
d62bcc8, that I have detected after more review of the LZ4 patch for

- Add support for LZ4 compression in pg_receivewal. pg_receivewal gains a new
option, --compression-method=lz4, available when the code is compiled with
--with-lz4. Similarly to gzip, this gives the possibility to compress
archived WAL segments with LZ4. This option is not compatible with
--compress. The implementation uses LZ4 frames, and is compatible with simple
lz4 commands. Like gzip, using --synchronous ensures that any data will be
flushed to disk within the current .partial segment, so as it is possible to
retrieve as much WAL data as possible even from a non-completed segment (this
requires completing the partial file with zeros up to the WAL segment size
supported by the backend after decompression, but this is the same as gzip).
The calculation of the streaming start LSN is able to transparently find and
check LZ4-compressed segments. Contrary to gzip where the uncompressed size
is directly stored in the object read, the LZ4 chunk protocol does not store
the uncompressed data by default. There is contentSize that can be used with
LZ4 frames by that would not help if using an archive that includes segments
compressed with the defaults of a "lz4" command, where this is not stored.
So, this commit has taken the most extensible approach by decompressing the
already-archived segment to check its uncompressed size, through a blank
output buffer in chunks of 64kB (no actual performance difference noticed with
8kB, 16kB or 32kB, and the operation in itself is actually fast). Tests have
been added to verify the creation and correctness of the generated LZ4 files.
The latter is achieved by the use of command "lz4", if found in the
environment. The tar-based WAL method in walmethods.c, used now only by
pg_basebackup, does not know yet about LZ4. Its code could be extended for
this purpose. Author: Georgios Kokolatos Reviewed-by: Michael Paquier, Jian
Guo, Magnus Hagander, Dilip Kumar Discussion:

- Improve psql tab completion for COMMENT. Completion is added for more object
types, like domain constraints, text search-ish objects or policies.
Moreover, the area is reorganized, changing the list of objects supported by
COMMENT to be in the same order as the documentation to ease future additions.
Author: Ken Kato Reviewed-by: Fujii Masao, Shinya Kato, Suraj Khamkar, Michael
Paquier Discussion:

Álvaro Herrera pushed:

- Handle XLOG_OVERWRITE_CONTRECORD in DecodeXLogOp. Failing to do so results in
inability of logical decoding to process the WAL stream. Handle it by doing
nothing. Backpatch all the way back. Reported-by: Petr Jelínek

- Reword doc blurb for vacuumdb --analyze-in-stages. Make users aware that using
it in a database with existing stats might cause transient problems. Author:
Nikolai Berkoff <nikolai(dot)berkoff(at)pm(dot)me> Discussion:

- Document default and changeability of log_startup_progress_interval. Review
for 9ce346eabf35. Author: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Reviewed-by: Robert Haas <robertmhaas(at)gmail(dot)com> Discussion:

- Pipeline mode disallows multicommand strings. ... so mention that in
appropriate places of the libpq docs. Backpatch to 14. Reported-by: RekGRpth
<rekgrpth(at)gmail(dot)com> Discussion:

- Document that ALTER TABLE .. TYPE removes statistics. Co-authored-by: Nikolai
Berkoff <nikolai(dot)berkoff(at)pm(dot)me> Discussion:

- Avoid crash in rare case of concurrent DROP. When a role being dropped
contains is referenced by catalog objects that are concurrently also being
dropped, a crash can result while trying to construct the string that
describes the objects. Suppress that by ignoring objects whose descriptions
are returned as NULL. The majority of relevant codesites were already
cautious about this already; we had just missed a couple. This is an old bug,
so backpatch all the way back. Reported-by: Alexander Lakhin
<exclusion(at)gmail(dot)com> Discussion:

Daniel Gustafsson pushed:

- Replace unicode characters in comments with ascii. The unicode characters,
while in comments and not code, caused MSVC to emit compiler warning C4819:
The file contains a character that cannot be represented in the current code
page (number). Save the file in Unicode format to prevent data loss. Fix
by replacing the characters in print.c with descriptive comments containing
the codepoints and symbol names, and remove the character in brin_bloom.c
which was a footnote reference copied from the paper citation. Per report
from hamerkop in the buildfarm. Reviewed-by: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

Amit Kapila pushed:

- Replace XLOG_INCLUDE_XID flag with a more localized flag. Commit 0bead9af484c
introduced XLOG_INCLUDE_XID flag to indicate that the WAL record contains
subXID-to-topXID association. It uses that flag later to mark in
CurrentTransactionState that top-xid is logged so that we should not try to
log it again with the next WAL record in the current subtransaction. However,
we can use a localized variable to pass that information. In passing, change
the related function and variable names to make them consistent with what the
code is actually doing. Author: Dilip Kumar Reviewed-by: Alvaro Herrera, Amit
Kapila Discussion:

- Move MarkCurrentTransactionIdLoggedIfAny() out of the critical section. We
don't modify any shared state in this function which could cause problems for
any concurrent session. This will make it look similar to the other updates
for the same structure (TransactionState) which avoids confusion for future
readers of code. Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion:

Fujii Masao pushed:

- pgbench: Improve error-handling in pgbench. Previously failures of initial
connection and logfile open caused pgbench to proceed the benchmarking, report
the incomplete results and exit with status 2. It didn't make sense to proceed
the benchmarking even when pgbench could not start as prescribed. This commit
improves pgbench so that early errors that occur when starting benchmark such
as those failures should make pgbench exit immediately with status 1. Author:
Yugo Nagata Reviewed-by: Fabien COELHO, Kyotaro Horiguchi, Fujii Masao

- pgbench: Fix typo in comment. Discussion:

Peter Geoghegan pushed:

- Don't overlook indexes during parallel VACUUM. Commit b4af70cb, which
simplified state managed by VACUUM, performed refactoring of parallel VACUUM
in passing. Confusion about the exact details of the tasks that the leader
process is responsible for led to code that made it possible for parallel
VACUUM to miss a subset of the table's indexes entirely. Specifically,
indexes that fell under the min_parallel_index_scan_size size cutoff were
missed. These indexes are supposed to be vacuumed by the leader (alongside
any parallel unsafe indexes), but weren't vacuumed at all. Affected indexes
could easily end up with duplicate heap TIDs, once heap TIDs were recycled for
new heap tuples. This had generic symptoms that might be seen with almost any
index corruption involving structural inconsistencies between an index and its
table. To fix, make sure that the parallel VACUUM leader process performs any
required index vacuuming for indexes that happen to be below the size cutoff.
Also document the design of parallel VACUUM with these below-size-cutoff
indexes. It's unclear how many users might be affected by this bug. There
had to be at least three indexes on the table to hit the bug: a smaller index,
plus at least two additional indexes that themselves exceed the size cutoff.
Cases with just one additional index would not run into trouble, since the
parallel VACUUM cost model requires two larger-than-cutoff indexes on the
table to apply any parallel processing. Note also that autovacuum was not
affected, since it never uses parallel processing. Test case based on tests
from a larger patch to test parallel VACUUM by Masahiko Sawada. Many thanks
to Kamigishi Rei for her invaluable help with tracking this problem down.
Author: Peter Geoghegan <pg(at)bowt(dot)ie> Author: Masahiko Sawada
<sawada(dot)mshk(at)gmail(dot)com> Reported-By: Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>
Reported-By: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> Diagnosed-By: Andres
Freund <andres(at)anarazel(dot)de> Bug: #17245 Discussion:
Backpatch: 14-, where the refactoring commit appears.

- Fix parallel amvacuumcleanup safety bug. Commit b4af70cb inverted the return
value of the function parallel_processing_is_safe(), but missed the
amvacuumcleanup test. Index AMs that don't support parallel cleanup at all
were affected. The practical consequences of this bug were not very serious.
Hash indexes are affected, but since they just return the number of blocks
during hashvacuumcleanup anyway, it can't have had much impact. Author:
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> Discussion:
Backpatch: 14-, where commit b4af70cb appears.

- Add another old commit to git-blame-ignore-revs. Add another historic pgindent
commit that was missed by the initial work done in commit 8e638845.

- Add various assertions to heap pruning code. These assertions document (and
verify) our high level assumptions about how pruning can and cannot affect
existing items from target heap pages. For example, one of the new assertions
verifies that pruning does not set a heap-only tuple to LP_DEAD. Author:
Peter Geoghegan <pg(at)bowt(dot)ie> Reviewed-By: Andres Freund <andres(at)anarazel(dot)de>

- Add hardening to catch invalid TIDs in indexes. Add hardening to the heapam
index tuple deletion path to catch TIDs in index pages that point to a heap
item that index tuples should never point to. The corruption we're trying to
catch here is particularly tricky to detect, since it typically involves
"extra" (corrupt) index tuples, as opposed to the absence of required index
tuples in the index. For example, a heap TID from an index page that turns
out to point to an LP_UNUSED item in the heap page has a good chance of being
caught by one of the new checks. There is a decent chance that the recently
fixed parallel VACUUM bug (see commit 9bacec15) would have been caught had
that particular check been in place for Postgres 14. No backpatch of this
extra hardening for now, though. Author: Peter Geoghegan <pg(at)bowt(dot)ie>
Reviewed-By: Andres Freund <andres(at)anarazel(dot)de> Discussion:

- Update obsolete heap pruning comments. Add new comments that spell out what
VACUUM expects from heap pruning: pruning must never leave behind DEAD tuples
that still have tuple storage. This has at least been the case since commit
8523492d, which established the principle that vacuumlazy.c doesn't have to
deal with DEAD tuples that still have tuple storage directly, except perhaps
by simply retrying pruning (to handle a rare corner case involving concurrent
transaction abort). In passing, update some references to old symbol names
that were missed by the snapshot scalability work (specifically commit

- Update obsolete reference in vacuumlazy.c. Oversight in commit 7ab96cf6.

Peter Eisentraut pushed:

- Fix incorrect format placeholder.

- pgcrypto: Remove non-OpenSSL support. pgcrypto had internal implementations of
some encryption algorithms, as an alternative to calling out to OpenSSL.
These were rarely used, since most production installations are built with
OpenSSL. Moreover, maintaining parallel code paths makes the code more
complex and difficult to maintain. This patch removes these internal
implementations. Now, pgcrypto is only built if OpenSSL support is
configured. Reviewed-by: Daniel Gustafsson <daniel(at)yesql(dot)se> Discussion:

Heikki Linnakangas pushed:

- Fix snapshot reference leak if lo_export fails. If lo_export() fails to open
the target file or to write to it, it leaks the created LargeObjectDesc and
its snapshot in the top-transaction context and resource owner. That's pretty
harmless, it's a small leak after all, but it gives the user a "Snapshot
reference leak" warning. Fix by using a short-lived memory context and no
resource owner for transient LargeObjectDescs that are opened and closed
within one function call. The leak is easiest to reproduce with lo_export() on
a directory that doesn't exist, but in principle the other `lo_*` functions
could also fail. Backpatch to all supported versions. Reported-by: Andrew B
Reviewed-by: Alvaro Herrera Discussion:

- Update alternative expected output file. Previous commit added a test to
'largeobject', but neglected the alternative expected output file
'largeobject_1.source'. Per failure on buildfarm animal 'hamerkop'.

Robert Haas pushed:

- amcheck: Add additional TOAST pointer checks. Expand the checks of toasted
attributes to complain if the rawsize is overlarge. For compressed
attributes, also complain if compression appears to have expanded the
attribute or if the compression method is invalid. Mark Dilger, reviewed by
Justin Pryzby, Alexander Alekseev, Heikki Linnakangas, Greg Stark, and me.

- Introduce 'bbsink' abstraction to modularize base backup code. The base backup
code has accumulated a healthy number of new features over the years, but it's
becoming increasingly difficult to maintain and further enhance that code
because there's no real separation of concerns. For example, the code that
understands knows the details of how we send data to the client using the
libpq protocol is scattered throughout basebackup.c, rather than being
centralized in one place. To try to improve this situation, introduce a new
'bbsink' object which acts as a recipient for archives generated during the
base backup progress and also for the backup manifest. This commit introduces
three types of bbsink: a 'copytblspc' bbsink forwards the backup to the client
using one COPY OUT operation per tablespace and another for the manifest, a
'progress' bbsink performs command progress reporting, and a 'throttle' bbsink
performs rate-limiting. The 'progress' and 'throttle' bbsink types also
forward the data to a successor bbsink; at present, the last bbsink in the
chain will always be of type 'copytblspc'. There are plans to add more types
of 'bbsink' in future commits. This abstraction is a bit leaky in the case of
progress reporting, but this still seems cleaner than what we had before.
Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee, Dilip
Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, and Jeevan
Ladhe. Discussion:

- Introduce 'bbstreamer' abstraction to modularize pg_basebackup. pg_basebackup
knows how to do quite a few things with a backup that it gets from the server,
like just write out the files, or compress them first, or even parse the tar
format and inject a modified file into the archive
generated by the server. Unforatunely, this makes pg_basebackup.c a very large
source file, and also somewhat difficult to enhance, because for example the
knowledge that the server is sending us a 'tar' file rather than some other
sort of archive is spread all over the place rather than centralized. In an
effort to improve this situation, this commit invents a new 'bbstreamer'
abstraction. Each archive received from the server is fed to a bbstreamer
which may choose to dispose of it or pass it along to some other bbstreamer.
Chunks may also be "labelled" according to whether they are part of the
payload data of a file in the archive or part of the archive metadata. So,
for example, if we want to take a tar file, modify the
file it contains, and the gzip the result and write it out, we can use a
bbstreamer_tar_parser to parse the tar file received from the server, a
bbstreamer_recovery_injector to modify the contents of, a
bbstreamer_tar_archiver to replace the tar headers for the file modified in
the previous step with newly-built ones that are correct for the modified
file, and a bbstreamer_gzip_writer to gzip and write the resulting data. Only
the objects with "tar" in the name know anything about the tar archive format,
and in theory we could re-archive using some other format rather than "tar" if
somebody wanted to write the code. These chances do add a substantial amount
of code, but I think the result is a lot more maintainable and extensible.
pg_basebackup.c itself shrinks by roughly a third, with a lot of the
complexity previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been reviewed
and tested at various times by Andres Freund, Sumanta Mukherjee, Dilip Kumar,
Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, Sergei Kornilov, and
Jeevan Ladhe. Discussion:

- Don't set ThisTimeLineID when there's no reason to do so. In slotfuncs.c,
pg_replication_slot_advance() needs to determine the LSN up to which the slot
should be advanced, but that doesn't require us to update ThisTimeLineID,
because none of the code called from here depends on it. If the replication
slot is logical, pg_logical_replication_slot_advance will call
read_local_xlog_page, which does use ThisTimeLineID, but also takes care of
making sure it's up to date. If the replication slot is physical, the timeline
isn't used for anything at all. In logicalfuncs.c,
pg_logical_slot_get_changes_guts() has the same issue: the only code we're
going to run that cares about timelines is in or downstream of
read_local_xlog_page, which already makes sure that the correct value gets
set. Hence, don't do it here. Patch by me, reviewed and tested by Michael
Paquier, Amul Sul, and Álvaro Herrera. Discussion:

- Remove all use of ThisTimeLineID global variable outside of xlog.c. All such
code deals with this global variable in one of three ways. Sometimes the same
functions use it in more than one of these ways at the same time. First,
sometimes it's an implicit argument to one or more functions being called in
xlog.c or elsewhere, and must be set to the appropriate value before calling
those functions lest they misbehave. In those cases, it is now passed as an
explicit argument instead. Second, sometimes it's used to obtain the current
timeline after the end of recovery, i.e. the timeline to which WAL is being
written and flushed. Such code now calls GetWALInsertionTimeLine() or relies
on the new out parameter added to GetFlushRecPtr(). Third, sometimes it's
used during recovery to store the current replay timeline. That can change, so
such code must generally update the value before each use. It can still do
that, but must now use a local variable instead. The net effect of these
changes is to reduce by a fair amount the amount of code that is directly
accessing this global variable. That's good, because history has shown that we
don't always think clearly about which timeline ID it's supposed to contain at
any given point in time, or indeed, whether it has been or needs to be
initialized at any given point in the code. Patch by me, reviewed and tested
by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion:

- Change ThisTimeLineID from a global variable to a local variable.
StartupXLOG() still has ThisTimeLineID as a local variable, but the remaining
code in xlog.c now needs to the relevant TimeLineID by some other means.
Mostly, this means that we now pass it as a function parameter to a bunch of
functions where we didn't previously. However, a few cases require special
handling: - In functions that might be called by outside callers who
wouldn't necessarily know what timeline to specify, we get the timeline ID
from shared memory. XLogCtl->ThisTimeLineID can be used in most cases since
recovery is known to have completed by the time those functions are called.
In xlog_redo(), we can use XLogCtl->replayEndTLI. - XLogFileClose() needs
to know the TLI of the open logfile. Do that with a new global variable
openLogTLI. While someone could argue that this is just trading one global
variable for another, the new one has a far more narrow purposes and is
referenced in just a few places. - read_backup_label() now returns the TLI
that it obtains by parsing the backup_label file. Previously, ReadRecord()
could be called to parse the checkpoint record without ThisTimeLineID having
been initialized. Now, the timeline is passed down, and I didn't want to
pass an uninitialized variable; this change lets us avoid that. The old
coding didn't seem to have any practical consequences that we need to
worry about, but this is cleaner. - In BootstrapXLOG(), it's just a constant.
Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro
Herrera. Discussion:

- Remove tests added by bd807be6935929bdefe74d1258ca08048f0aafa3. The buildfarm
is unhappy. It's not obvious why it doesn't like these tests, but let's remove
them until we figure it out. Discussion:

Tomáš Vondra pushed:

- Fix handling of NaN values in BRIN minmax multi. When calculating distance
between float4/float8 values, we need to be a bit more careful about NaN
values in order not to trigger assert. We consider NaN values to be equal
(distace 0.0) and in infinite distance from all other values. On builds
without asserts, this issue is mostly harmless - the ranges may be merged in
less efficient order, but the index is still correct. Per report from Andreas
Seltenreich. Backpatch to 14, where this new BRIN opclass was introduced.
Reported-by: Andreas Seltenreich Discussion:

- Mark mystreamer variable as PG_USED_FOR_ASSERTS_ONLY. Silences warnings about
unused variable, when built without asserts.

- Add bool GiST opclass to btree_gist. Adds bool opclass to btree_gist
extension, to allow creating GiST indexes on bool columns. GiST indexes on a
single bool column don't seem particularly useful, but this allows defining
exclusion constraings involving a bool column, for example. Author: Emre
Hasegeli Reviewed-by: Andrey Borodin Discussion:

- Fix gist_bool_ops to use gbtreekey2. Commit 57e3c5160b added a new GiST bool
opclass, but it used gbtreekey4 to store the data, which left two bytes
undefined, as reported by skink, our valgrind animal. There was a bit more
confusion, because the opclass also used gbtreekey8 in the definition. Fix by
defining a new gbtreekey2 struct, and using it in all the places. Discussion:

Alexander Korotkov pushed:

- Reset lastOverflowedXid on standby when needed. Currently, lastOverflowedXid
is never reset. It's just adjusted on new transactions known to be
overflowed. But if there are no overflowed transactions for a long time,
snapshots could be mistakenly marked as suboverflowed due to wraparound. This
commit fixes this issue by resetting lastOverflowedXid when needed altogether
with KnownAssignedXids. Backpatch to all supported versions. Reported-by:
Stan Hu Discussion:
Author: Kyotaro Horiguchi, Alexander Korotkov Reviewed-by: Stan Hu, Simon
Riggs, Nikolay Samokhvalov, Andrey Borodin, Dmitry Dolgov

Andres Freund pushed:

- windows: Remove use of WIN32_LEAN_AND_MEAN from crashdump.c. Since 8162464a25e
we do so in win32_port.h. But it likely didn't do much before that either,
because at that point windows.h was already included via win32_port.h.
Reported-By: Tom Lane Discussion:

Browse pgsql-announce by date

  From Date Subject
Next Message Pgpool Global Development Group via PostgreSQL Announce 2021-11-09 04:37:04 Pgpool-II 4.3 beta1 is now released.
Previous Message Toshiba via PostgreSQL Announce 2021-11-02 15:36:12 DynamoDB fdw 1.0.0 released