PostgreSQL Weekly News - January 17, 2021

From: PWN via PostgreSQL Announce <announce-noreply(at)postgresql(dot)org>
To: PostgreSQL Announce <pgsql-announce(at)lists(dot)postgresql(dot)org>
Subject: PostgreSQL Weekly News - January 17, 2021
Date: 2021-01-18 08:29:10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-announce

# PostgreSQL Weekly News - January 17, 2021

Person of the week:

# PostgreSQL Product News

pspg 4.0.0 a pager designed for PostgreSQL, released.

DBConvert Studio 2.0, a database migration and synchronization suite that
supports PostgreSQL, released.

# PostgreSQL Jobs for January


# PostgreSQL in the News

Planet PostgreSQL: [](

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to david(at)fetter(dot)org(dot)

# Applied Patches

Thomas Munro pushed:

- Provide pg_preadv() and pg_pwritev(). Provide synchronous vectored file I/O
routines. These map to preadv() and pwritev(), with fallback implementations
for systems that don't have them. Also provide a wrapper
pg_pwritev_with_retry() that automatically retries on short writes.
Reviewed-by: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Reviewed-by: Andres Freund
<andres(at)anarazel(dot)de> Discussion:

- Use vectored I/O to fill new WAL segments. Instead of making many block-sized
write() calls to fill a new WAL file with zeroes, make a smaller number of
pwritev() calls (or various emulations). The actual number depends on the
OS's IOV_MAX, which PG_IOV_MAX currently caps at 32. That means we'll write
256kB per call on typical systems. We may want to tune the number later with
more experience. Reviewed-by: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Reviewed-by:
Andres Freund <andres(at)anarazel(dot)de> Discussion:

- Fix function prototypes in dependency.h. Commit 257836a7 accidentally deleted
a couple of redundant-but-conventional "extern" keywords on function
prototypes. Put them back. Reported-by: Alvaro Herrera

- Don't use elog() in src/port/pwrite.c. Nothing broke because of this oversight
yet, but it would fail to link if we tried to use pg_pwrite() in frontend code
on a system that lacks pwrite(). Use an assertion instead. Also pgindent
while here. Discussion:

- Move our p{read,write}v replacements into their own files. macOS's ranlib
issued a warning about an empty pread.o file with the previous arrangement, on
systems new enough to require no replacement functions. Let's go back to
using configure's AC_REPLACE_FUNCS system to build and include each .o in the
library only if it's needed, which requires moving the *v() functions to their
own files. Also move the _with_retry() wrapper to a more permanent home.
Reported-by: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Discussion:

- Minor header cleanup for the new iovec code. Remove redundant function
declaration and improve header comment in pg_iovec.h. Move the new
declaration in fd.h next to a group of more similar functions.

Tom Lane pushed:

- In libpq, always append new error messages to conn->errorMessage. Previously,
we had an undisciplined mish-mash of printfPQExpBuffer and appendPQExpBuffer
calls to report errors within libpq. This commit establishes a uniform rule
that appendPQExpBuffer[Str] should be used. conn->errorMessage is reset only
at the start of an application request, and then accumulates messages till
we're done. We can remove no less than three different ad-hoc mechanisms that
were used to get the effect of concatenation of error messages within a
sequence of operations. Although this makes things quite a bit cleaner
conceptually, the main reason to do it is to make the world safer for the
multiple-target-host feature that was added awhile back. Previously, there
were many cases in which an error occurring during an individual host
connection attempt would wipe out the record of what had happened during
previous attempts. (The reporting is still inadequate, in that it can be hard
to tell which host got the failure, but that seems like a matter for a
separate commit.) Currently, lo_import and lo_export contain exceptions to
the "never use printfPQExpBuffer" rule. If we changed them, we'd risk
reporting an incidental lo_close failure before the actual read or write
failure, which would be confusing, not least because lo_close happened after
the main failure. We could improve this by inventing an internal version of
lo_close that doesn't reset the errorMessage; but we'd also need a version of
PQfn() that does that, and it didn't quite seem worth the trouble for now.

- Allow pg_regress.c wrappers to postprocess test result files. Add an optional
callback to regression_main() that, if provided, is invoked on each test
output file before we try to compare it to the expected-result file. The main
and isolation test programs don't need this (yet). In pg_regress_ecpg, add a
filter that eliminates target-host details from "could not connect" error
reports. This filter doesn't do anything as of this commit, but it will be
needed by the next one. In the long run we might want to provide some more
general, perhaps pattern-based, filtering mechanism for test output. For now,
this will solve the immediate problem. Discussion:

- Uniformly identify the target host in libpq connection failure reports. Prefix
"could not connect to host-or-socket-path:" to all connection failure cases
that occur after the socket() call, and remove the ad-hoc server identity data
that was appended to a few of these messages. This should produce much more
intelligible error reports in multiple-target-host situations, especially for
error cases that are off the beaten track to any degree (because none of those
provided any server identity info). As an example of the change, formerly a
connection attempt with a bad port number such as "psql -p 12345 -h
localhost,/tmp" might produce psql: error: could not connect to server:
Connection refused Is the server running on host "localhost" (::1) and
accepting TCP/IP connections on port 12345? could not connect to
server: Connection refused Is the server running on host "localhost"
( and accepting TCP/IP connections on port 12345? could not
connect to server: No such file or directory Is the server running
locally and accepting connections on Unix domain socket
"/tmp/.s.PGSQL.12345"? Now it looks like psql: error: could not connect to
host "localhost" (::1), port 12345: Connection refused Is the server
running on that host and accepting TCP/IP connections? could not connect to
host "localhost" (, port 12345: Connection refused Is the
server running on that host and accepting TCP/IP connections? could not
connect to socket "/tmp/.s.PGSQL.12345": No such file or directory Is
the server running locally and accepting connections on that socket? This
requires adjusting a couple of regression tests to allow for variation in the
contents of a connection failure message. Discussion:

- Try next host after a "cannot connect now" failure. If a server returns
ERRCODE_CANNOT_CONNECT_NOW, try the next host, if multiple host names have
been provided. This allows dealing gracefully with standby servers that might
not be in hot standby mode yet. In the wake of the preceding commit, it might
be plausible to retry many more error cases than we do now, but I (tgl) am
hesitant to move too aggressively on that --- it's not clear it'd be desirable
for cases such as bad-password, for example. But this case seems safe enough.
Hubert Zhang, reviewed by Takayuki Tsunakawa Discussion:

- Rethink SQLSTATE code for ERRCODE_IDLE_SESSION_TIMEOUT. Move it to class 57
(Operator Intervention), which seems like a better choice given that from the
client's standpoint it behaves a heck of a lot like, e.g.,
ERRCODE_ADMIN_SHUTDOWN. In a green field I'd put
around for a few years, so it's probably too late to change its SQLSTATE code.

- Make pg_dump's table of object-type priorities more maintainable. Wedging a
new object type into this table has historically required manually renumbering
a lot of existing entries. (Although it appears that some people got lazy and
re-used the priority level of an existing object type, even if it wasn't
particularly related.) We can let the compiler do the counting by inventing an
enum type that lists the desired priority levels in order. Now, if you want
to add or remove a priority level, that's a one-liner. This patch is not
purely cosmetic, because I split apart the priorities of DO_COLLATION and
DO_TRANSFORM, as well as those of DO_ACCESS_METHOD and DO_OPERATOR, which look
to me to have been merged out of expediency rather than because it was a good
idea. Shell types continue to be sorted interchangeably with full types, and
opclasses interchangeably with opfamilies.

- Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry. Previously,
we emitted the ATTACH PARTITION command as part of the child table's
ArchiveEntry. This was a poor choice since it complicates restoring the
partition as a standalone table; you have to ignore the error from the ATTACH,
which isn't even an option when restoring direct-to-database with pg_restore.
(pg_restore will issue the whole ArchiveEntry as one PQexec, so that any error
rolls back the table creation as well.) Hence, separate it out as its own
ArchiveEntry, as indeed we already did for index ATTACH PARTITION commands.
Justin Pryzby Discussion:

- Doc: fix description of privileges needed for ALTER PUBLICATION. Adding a
table to a publication requires ownership of the table (in addition to
ownership of the publication). This was mentioned nowhere.

- pg_dump: label INDEX ATTACH ArchiveEntries with an owner. Although a
partitioned index's attachment to its parent doesn't have separate ownership,
the ArchiveEntry for it needs to be marked with an owner anyway, to ensure
that the ALTER command is run by the appropriate role when restoring with
--use-set-session-authorization. Without this, the ALTER will be run by the
role that started the restore session, which will usually work but it's
formally the wrong thing. Back-patch to v11 where this type of ArchiveEntry
was added. In HEAD, add equivalent commentary to the just-added TABLE ATTACH
case, which I'd made do the right thing already. Discussion:

- Doc: clarify behavior of back-half options in pg_dump. Options that change how
the archive data is converted to SQL text are ignored when dumping to archive
formats. The documentation previously said "not meaningful", which is not
helpful. Discussion:

- Disallow a digit as the first character of a variable name in pgbench. The
point of this restriction is to avoid trying to substitute variables into
timestamp literal values, which may contain strings like '12:34'. There is a
good deal more that should be done to reduce pgbench's tendency to substitute
where it shouldn't. But this is sufficient to solve the case complained of by
Jaime Soler, and it's simple enough to back-patch. Back-patch to v11; before
commit 9d36a3866, pgbench had a slightly different definition of what a
variable name is, and anyway it seems unwise to change long-stable branches
for this. Fabien Coelho Discussion:

- Doc, more or less: uncomment tutorial example that was fixed long ago. Reverts
a portion of commit 344190b7e. Apparently, back in the twentieth century we
had some issues with multi-statement SQL functions, but they've worked fine
for a long time. Daniel Westermann Discussion:

- Run reformat-dat-files to declutter the catalog data files. Things had gotten
pretty messy here, apparently mostly but not entirely the fault of the
multirange patch. No functional changes.

- Mark inet_server_addr() and inet_server_port() as parallel-restricted. These
need to be PR because they access the MyProcPort data structure, which doesn't
get copied to parallel workers. The very similar functions inet_client_addr()
and inet_client_port() are already marked PR, but somebody missed these.
Although this is a pre-existing bug, we can't readily fix it in the back
branches since we can't force initdb. Given the small usage of these two
functions, and the even smaller likelihood that they'd get pushed to a
parallel worker anyway, it doesn't seem worth the trouble to suggest that DBAs
should fix it manually. Masahiko Sawada Discussion:

- pg_dump: label PUBLICATION TABLE ArchiveEntries with an owner. This is the
same fix as commit 9eabfe300 applied to INDEX ATTACH entries, but for
table-to-publication attachments. As in that case, even though the backend
doesn't record "ownership" of the attachment, we still ought to label it in
the dump archive with the role name that should run the ALTER PUBLICATION
command. The existing behavior causes the ALTER to be done by the original
role that started the restore; that will usually work fine, but there may be
corner cases where it fails. The bulk of the patch is concerned with changing
struct PublicationRelInfo to include a pointer to the associated
PublicationInfo object, so that we can get the owner's name out of that when
the time comes. While at it, I rewrote getPublicationTables() to do just one
query of pg_publication_rel, not one per table. Back-patch to v10 where this
code was introduced. Discussion:

- Improve our heuristic for selecting PG_SYSROOT on macOS. In cases where Xcode
is newer than the underlying macOS version, asking xcodebuild for the SDK path
will produce a pointer to the SDK shipped with Xcode, which may end up
building code that does not work on the underlying macOS version. It appears
that in such cases, xcodebuild's answer also fails to match the default
behavior of Apple's compiler: assuming one has installed Xcode's "command line
tools", there will be an SDK for the OS's own version in
/Library/Developer/CommandLineTools, and the compiler will default to using
that. This is all pretty poorly documented, but experimentation suggests that
"xcrun --show-sdk-path" gives the sysroot path that the compiler is actually
using, at least in some cases. Hence, try that first, but revert to
xcodebuild if xcrun fails (in very old Xcode, it is missing or lacks the
--show-sdk-path switch). Also, "xcrun --show-sdk-path" may give a path that
is valid but lacks any OS version identifier. We don't really want that,
since most of the motivation for wiring -isysroot into the build flags at all
is to ensure that all parts of a PG installation are built against the same
SDK, even when considering extensions built later and/or on a different
machine. Insist on finding "N.N" in the directory name before accepting the
result. (Adding "--sdk macosx" to the xcrun call seems to produce the same
answer as xcodebuild, but usually more quickly because it's cached, so we also
try that as a fallback.) The core reason why we don't want to use Xcode's
default SDK in cases like this is that Apple's technology for introducing new
syscalls does not play nice with Autoconf: for example, configure will think
that preadv/pwritev exist when using a Big Sur SDK, even when building on an
older macOS version where they don't exist. It'd be nice to have a better
solution to that problem, but this patch doesn't attempt to fix that. Per
report from Sergey Shinderuk. Back-patch to all supported versions.

- Add missing array-enlargement logic to test_regex.c. The stanza to report a
"partial" match could overrun the initially allocated output array, so it
needs its own copy of the array-resizing logic that's in the main loop. I
overlooked the need for this in ca8217c10. Per report from Alexander Lakhin.

Amit Kapila pushed:

- Optimize DropRelFileNodeBuffers() for recovery. The recovery path of
DropRelFileNodeBuffers() is optimized so that scanning of the whole buffer
pool can be avoided when the number of blocks to be truncated in a relation is
below a certain threshold. For such cases, we find the buffers by doing
lookups in BufMapping table. This improves the performance by more than 100
times in many cases when several small tables (tested with 1000 relations) are
truncated and where the server is configured with a large value of shared
buffers (greater than equal to 100GB). This optimization helps cases (a) when
vacuum or autovacuum truncated off any of the empty pages at the end of a
relation, or (b) when the relation is truncated in the same transaction in
which it was created. This commit introduces a new API smgrnblocks_cached
which returns a cached value for the number of blocks in a relation fork. This
helps us to determine the exact size of relation which is required to apply
this optimization. The exact size is required to ensure that we don't leave
any buffer for the relation being dropped as otherwise the background writer
or checkpointer can lead to a PANIC error while flushing buffers corresponding
to files that don't exist. Author: Kirk Jamison based on ideas by Amit Kapila
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila Tested-By:
Haiying Tang Discussion:

- Fix relation descriptor leak. We missed closing the relation descriptor while
sending changes via the root of partitioned relations during logical
replication. Author: Amit Langote and Mark Zhao Reviewed-by: Amit Kapila and
Ashutosh Bapat Backpatch-through: 13, where it was introduced Discussion:

- Optimize DropRelFileNodesAllBuffers() for recovery. Similar to commit
d6ad34f341, this patch optimizes DropRelFileNodesAllBuffers() by avoiding the
complete buffer pool scan and instead find the buffers to be invalidated by
doing lookups in the BufMapping table. This optimization helps operations
where the relation files need to be removed like Truncate, Drop, Abort of
Create Table, etc. Author: Kirk Jamison Reviewed-by: Kyotaro Horiguchi,
Takayuki Tsunakawa, and Amit Kapila Tested-By: Haiying Tang Discussion:

- Fix memory leak in SnapBuildSerialize. The memory for the snapshot was leaked
while serializing it to disk during logical decoding. This memory will be
freed only once walsender stops streaming the changes. This can lead to a huge
memory increase when master logs Standby Snapshot too frequently say when the
user is trying to create many replication slots. Reported-by:
funnyxj(dot)fxj(at)alibaba-inc(dot)com Diagnosed-by: funnyxj(dot)fxj(at)alibaba-inc(dot)com Author:
Amit Kapila Backpatch-through: 9.5 Discussion:

- Remove unnecessary pstrdup in fetch_table_list. The result of
TextDatumGetCString is already palloc'ed so we don't need to allocate memory
for it again. We decide not to backpatch it as there doesn't seem to be any
case where it can create a meaningful leak. Author: Zhijie Hou Reviewed-by:
Daniel Gustafsson Discussion:

Álvaro Herrera pushed:

- Fix thinko in comment. This comment has been wrong since its introduction in
commit 2c03216d8311. Author: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>

- Invent struct ReindexIndexInfo. This struct is used by
ReindexRelationConcurrently to keep track of the relations to process. This
saves having to obtain some data repeatedly, and has future uses as well.
Reviewed-by: Dmitry Dolgov <9erthalion6(at)gmail(dot)com> Reviewed-by: Hamid Akhtar
<hamid(dot)akhtar(at)gmail(dot)com> Reviewed-by: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>

- Call out vacuum considerations in create index docs. Backpatch to pg12, which
is as far as it goes without conflicts. Author: James Coleman
<jtc331(at)gmail(dot)com> Reviewed-by: "David G. Johnston"
<david(dot)g(dot)johnston(at)gmail(dot)com> Discussion:

- Prevent drop of tablespaces used by partitioned relations. When a tablespace
is used in a partitioned relation (per commits ca4103025dfe in pg12 for tables
and 33e6c34c3267 in pg11 for indexes), it is possible to drop the tablespace,
potentially causing various problems. One such was reported in bug #16577,
where a rewriting ALTER TABLE causes a server crash. Protect against this by
using pg_shdepend to keep track of tablespaces when used for relations that
don't keep physical files; we now abort a tablespace if we see that the
tablespace is referenced from any partitioned relations. Backpatch this to
11, where this problem has been latent all along. We don't try to create
pg_shdepend entries for existing partitioned indexes/tables, but any ones that
are modified going forward will be protected. Note slight behavior change:
when trying to drop a tablespace that contains both regular tables as well as
partitioned ones, you'd previously get
ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST. Arguably, the latter is more correct.
It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing ALTER TABLE ONLY some_partitioned_table SET
TABLESPACE pg_default; ALTER TABLE ONLY some_partitioned_table SET
TABLESPACE original_tablespace; for each partitioned table/index that is not
in the database default tablespace. Because these partitioned objects do not
have storage, no file needs to be actually moved, so it shouldn't take more
time than what's required to acquire locks. This query can be used to search
for such relations: SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND
reltablespace <> 0 Reported-by: Alexander Lakhin <exclusion(at)gmail(dot)com>
Author: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> Reviewed-by: Michael Paquier

- Avoid spurious wait in concurrent reindex. This is like commit c98763bf51bf,
but for REINDEX CONCURRENTLY. To wit: this flags indicates that the current
process is safe to ignore for the purposes of waiting for other snapshots,
processes doing either of those things not deadlock, and also avoids spurious
waits. Author: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> Reviewed-by: Dmitry
Dolgov <9erthalion6(at)gmail(dot)com> Reviewed-by: Hamid Akhtar
<hamid(dot)akhtar(at)gmail(dot)com> Reviewed-by: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>

Michaël Paquier pushed:

- Fix routine name in comment of catcache.c. Author: Bharath Rupireddy

- Rework refactoring of hex and encoding routines. This commit addresses some
issues with c3826f83 that moved the hex decoding routine to src/common/: - The
decoding function lacked overflow checks, so when used for security-related
features it was an open door to out-of-bound writes if not carefully used that
could remain undetected. Like the base64 routines already in src/common/ used
by SCRAM, this routine is reworked to check for overflows by having the size
of the destination buffer passed as argument, with overflows checked before
doing any writes. - The encoding routine was missing. This is moved to
src/common/ and it gains the same overflow checks as the decoding part. On
failure, the hex routines of src/common/ issue an error as per the discussion
done to make them usable by frontend tools, but not by shared libraries. Note
that this is why ECPG is left out of this commit, and it still includes a
duplicated logic doing hex encoding and decoding. While on it, this commit
uses better variable names for the source and destination buffers in the
existing escape and base64 routines in encode.c and it makes them more robust
to overflow detection. The previous core code issued a FATAL after doing
out-of-bound writes if going through the SQL functions, which would be enough
to detect problems when working on changes that impacted this area of the
code. Instead, an error is issued before doing an out-of-bound write. The hex
routines were being directly called for bytea conversions and backup manifests
without such sanity checks. The current calls happen to not have any
problems, but careless uses of such APIs could easily lead to CVE-class bugs.
Author: Bruce Momjian, Michael Paquier Reviewed-by: Sehrope Sarkuni

- Fix O(N^2) stat() calls when recycling WAL segments. The counter tracking the
last segment number recycled was getting initialized when recycling one single
segment, while it should be used across a full cycle of segments recycled to
prevent useless checks related to entries already recycled. This performance
issue has been introduced by b2a5545, and it was first implemented in
61b86142. No backpatch is done per the lack of field complaints.
Reported-by: Andres Freund, Thomas Munro Author: Michael Paquier Reviewed-By:
Andres Freund Discussion:

- Remove PG_SHA*_DIGEST_STRING_LENGTH from sha2.h. The last reference to those
variables has been removed in aef8948, so this cleans up a bit the code.

Heikki Linnakangas pushed:

- Add functions to 'pageinspect' to inspect GiST indexes. Author: Andrey Borodin
and me Discussion:

- Fix portability issues in the new gist pageinspect test. 1. The raw bytea
representation of the point-type keys used in the test depends on
endianess. Remove the raw key_data column from the test. 2. The items stored
on non-leftmost gist page depends on how many items git on the other pages.
This showed up as a failure on 32-bit i386 systems. To fix, only test the
gist_page_items() function on the leftmost leaf page. Per Andrey Borodin
and the buildfarm. Discussion:

- Fix test failure with wal_level=minimal. The newly-added gist pageinspect test
prints the LSNs of GiST pages, expecting them all to be 1 (GistBuildLSN). But
with wal_level=minimal, they got updated by the whole-relation WAL-logging at
commit. Fix by wrapping the problematic tests in the same transaction with the
CREATE INDEX. Per buildfarm failure on thorntail. Discussion:

Magnus Hagander pushed:

- Remove incorrect markup. Seems 737d69ffc3c made a copy/paste or automation
error resulting in two extra right-parenthesis. Reported-By: Michael Vastola
Backpatch-through: 13 Discussion:

- Add pg_stat_database counters for sessions and session time. This add counters
for number of sessions, the different kind of session termination types, and
timers for how much time is spent in active vs idle in a database to
pg_stat_database. Internally this also renames the parameter "force" to
disconnect. This was the only use-case for the parameter before, so
repurposing it to this mroe narrow usecase makes things cleaner than inventing
something new. Author: Laurenz Albe Reviewed-By: Magnus Hagander, Soumyadeep
Chakraborty, Masahiro Ikeda Discussion:

- Add --no-instructions parameter to initdb. Specifying this parameter removes
the informational messages about how to start the server. This is intended for
use by wrappers in different packaging systems, where those instructions would
most likely be wrong anyway, but the other output from initdb would still be
useful (and thus just redirecting everything to /dev/null would be bad).
Author: Magnus Hagander Reviewed-By: Peter Eisentraut Discusion:

- Add documentation chapter about checksums. Data checksums did not have a
longer discussion in the docs, this adds a short section with an overview.
Extracted from the larger patch for on-line enabling of checksums, which has
many more authors and reviewers. Author: Daniel Gustafsson Reviewed-By:
Magnus Hagander, Michael Banck (and others through the big patch) Discussion:

Fujii Masao pushed:

- Log long wait time on recovery conflict when it's resolved. This is a
follow-up of the work done in commit 0650ff2303. This commit extends
log_recovery_conflict_waits so that a log message is produced also when
recovery conflict has already been resolved after deadlock_timeout passes,
i.e., when the startup process finishes waiting for recovery conflict after
deadlock_timeout. This is useful in investigating how long recovery conflicts
prevented the recovery from applying WAL. Author: Fujii Masao Reviewed-by:
Kyotaro Horiguchi, Bertrand Drouvot Discussion:

- Ensure that a standby is able to follow a primary on a newer timeline. Commit
709d003fbd refactored WAL-reading code, but accidentally caused
WalSndSegmentOpen() to fail to follow a timeline switch while reading from a
historic timeline. This issue caused a standby to fail to follow a primary on
a newer timeline when WAL archiving is enabled. If there is a timeline switch
within the segment, WalSndSegmentOpen() should read from the WAL segment
belonging to the new timeline. But previously since it failed to follow a
timeline switch, it tried to read the WAL segment with old timeline. When WAL
archiving is enabled, that WAL segment with old timeline doesn't exist because
it's renamed to .partial. This leads a primary to have tried to read
non-existent WAL segment, and which caused replication to faill with the error
"ERROR: requested WAL segment ... has already been removed". This commit
fixes WalSndSegmentOpen() so that it's able to follow a timeline switch, to
ensure that a standby is able to follow a primary on a newer timeline even
when WAL archiving is enabled. This commit also adds the regression test to
check whether a standby is able to follow a primary on a newer timeline when
WAL archiving is enabled. Back-patch to v13 where the bug was introduced.
Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi, tweaked by Fujii
Masao Reviewed-by: Alvaro Herrera, Fujii Masao Discussion:

- Improve tab-completion for CLOSE, DECLARE, FETCH and MOVE. This commit makes
CLOSE, FETCH and MOVE commands tab-complete the list of cursors. Also this
commit makes DECLARE command tab-complete the options. Author: Shinya Kato,
Sawada Masahiko, tweaked by Fujii Masao Reviewed-by: Shinya Kato, Sawada
Masahiko, Fujii Masao Discussion:

- Stabilize timeline switch regression test. Commit fef5b47f6b added the
regression test to check whether a standby is able to follow a primary on a
newer timeline when WAL archiving is enabled. But the buildfarm member
florican reported that this test failed because the requested WAL segment was
removed and replication failed. This is a timing issue. Since neither
replication slot is used nor wal_keep_size is set in the test, checkpoint
could remove the WAL segment that's still necessary for replication. This
commit stabilizes the test by setting wal_keep_size. Back-patch to v13 where
the regression test that this commit stabilizes was added. Author: Fujii
Masao Discussion:

- postgres_fdw: Save foreign server OID in connection cache entry. The foreign
server OID stored in the connection cache entry is used as a lookup key to
directly get the server name. Previously since the connection cache entry did
not have the server OID, postgres_fdw had to get the server OID at first from
user mapping before getting the server name. So if the corresponding user
mapping was dropped, postgres_fdw could raise the error "cache lookup failed
for user mapping" while looking up user mapping and fail to get the server
name even though the server had not been dropped yet. Author: Bharath
Rupireddy Reviewed-by: Fujii Masao Discussion:

- Fix calculation of how much shared memory is required to store a TOC. Commit
ac883ac453 refactored shm_toc_estimate() but changed its calculation of shared
memory size for TOC incorrectly. Previously this could cause too large memory
to be allocated. Back-patch to v11 where the bug was introduced. Author:
Takayuki Tsunakawa Discussion:

Peter Geoghegan pushed:

- Pass down "logically unchanged index" hint. Add an executor aminsert() hint
mechanism that informs index AMs that the incoming index tuple (the tuple that
accompanies the hint) is not being inserted by execution of an SQL statement
that logically modifies any of the index's key columns. The hint is received
by indexes when an UPDATE takes place that does not apply an optimization like
heapam's HOT (though only for indexes where all key columns are logically
unchanged). Any index tuple that receives the hint on insert is expected to
be a duplicate of at least one existing older version that is needed for the
same logical row. Related versions will typically be stored on the same index
page, at least within index AMs that apply the hint. Recognizing the
difference between MVCC version churn duplicates and true logical row
duplicates at the index AM level can help with cleanup of garbage index
tuples. Cleanup can intelligently target tuples that are likely to be
garbage, without wasting too many cycles on less promising tuples/pages (index
pages with little or no version churn). This is infrastructure for an
upcoming commit that will teach nbtree to perform bottom-up index deletion.
No index AM actually applies the hint just yet. Author: Peter Geoghegan
<pg(at)bowt(dot)ie> Reviewed-By: Victor Yegorov <vyegorov(at)gmail(dot)com> Discussion:

- Enhance nbtree index tuple deletion. Teach nbtree and heapam to cooperate in
order to eagerly remove duplicate tuples representing dead MVCC versions.
This is "bottom-up deletion". Each bottom-up deletion pass is triggered
lazily in response to a flood of versions on an nbtree leaf page. This
usually involves a "logically unchanged index" hint (these are produced by the
executor mechanism added by commit 9dc718bd). The immediate goal of bottom-up
index deletion is to avoid "unnecessary" page splits caused entirely by
version duplicates. It naturally has an even more useful effect, though: it
acts as a backstop against accumulating an excessive number of index tuple
versions for any given _logical row_. Bottom-up index deletion complements
what we might now call "top-down index deletion": index vacuuming performed by
VACUUM. Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean sweeps
of the index. The overall effect is to avoid certain pathological performance
issues related to "version churn" from UPDATEs. The previous tableam
interface used by index AMs to perform tuple deletion (the
table_compute_xid_horizon_for_tuples() function) has been replaced with a new
interface that supports certain new requirements. Many (perhaps all) of the
capabilities added to nbtree by this commit could also be extended to other
index AMs. That is left as work for a later commit. Extend deletion of
LP_DEAD-marked index tuples in nbtree by adding logic to consider extra index
tuples (that are not LP_DEAD-marked) for deletion in passing. This increases
the number of index tuples deleted significantly in many cases. The LP_DEAD
deletion process (which is now called "simple deletion" to clearly distinguish
it from bottom-up deletion) won't usually need to visit any extra table blocks
to check these extra tuples. We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the index
deletion operation's WAL record needs such a value). Testing has shown that
the "extra tuples" simple deletion enhancement increases the number of index
tuples deleted with almost any workload that has LP_DEAD bits set in leaf
pages. That is, it almost never fails to delete at least a few extra index
tuples. It helps most of all in cases that happen to naturally have a lot of
delete-safe tuples. It's not uncommon for an individual deletion operation to
end up deleting an order of magnitude more index tuples compared to the old
naive approach (e.g., custom instrumentation of the patch shows that this
happens fairly often when the regression tests are run). Add a further
enhancement that augments simple deletion and bottom-up deletion in indexes
that make use of deduplication: Teach nbtree's _bt_delitems_delete() function
to support granular TID deletion in posting list tuples. It is now possible
to delete individual TIDs from posting list tuples provided the TIDs have a
tableam block number of a table block that gets visited as part of the
deletion process (visiting the table block can be triggered directly or
indirectly). Setting the LP_DEAD bit of a posting list tuple is still an
all-or-nothing thing, but that matters much less now that deletion only needs
to start out with the right _general_ idea about which index tuples are
deletable. Bump XLOG_PAGE_MAGIC because xl_btree_delete changed. No bump in
BTREE_VERSION, since there are no changes to the on-disk representation of
nbtree indexes. Indexes built on PostgreSQL 12 or PostgreSQL 13 will
automatically benefit from bottom-up index deletion (i.e. no reindexing
required) following a pg_upgrade. The enhancement to simple deletion is
available with all B-Tree indexes following a pg_upgrade, no matter what
PostgreSQL version the user upgrades from. Author: Peter Geoghegan
<pg(at)bowt(dot)ie> Reviewed-By: Heikki Linnakangas <hlinnaka(at)iki(dot)fi> Reviewed-By:
Victor Yegorov <vyegorov(at)gmail(dot)com> Discussion:

Tomáš Vondra pushed:

- Disallow CREATE STATISTICS on system catalogs. Add a check that CREATE
STATISTICS does not add extended statistics on system catalogs, similarly to
indexes etc. It can be overriden using the allow_system_table_mods GUC. This
bug exists since 7b504eb282c, adding the extended statistics, so backpatch all
the way back to PostgreSQL 10. Author: Tomas Vondra Reported-by: Dean Rasheed
Backpatch-through: 10 Discussion:

- psql \dX: list extended statistics objects. The new command lists extended
statistics objects, possibly with their sizes. All past releases with extended
statistics are supported. Author: Tatsuro Yamada Reviewed-by: Julien Rouhaud,
Alvaro Herrera, Tomas Vondra Discussion:

- Revert "psql \dX: list extended statistics objects". Reverts 891a1d0bca,
because the new psql command \dX only worked for users users who can read
pg_statistic_ext_data catalog, and most regular users lack that privilege (the
catalog may contain sensitive user data). Reported-by: Noriyoshi Shinoda

Noah Misch pushed:

- Fix pg_dump for GRANT OPTION among initial privileges. The context is an
object that no longer bears some aclitem that it bore initially. (A user
issued REVOKE or GRANT statements upon the object.) pg_dump is forming SQL to
reproduce the object ACL. Since initdb creates no ACL bearing GRANT OPTION,
reaching this bug requires an extension where the creation script establishes
such an ACL. No PGXN extension does that. If an installation did reach the
bug, pg_dump would have omitted a semicolon, causing a REVOKE and the next SQL
statement to fail. Separately, since the affected code exists to eliminate an
entire aclitem, it wants plain REVOKE, not REVOKE GRANT OPTION FOR.
Back-patch to 9.6, where commit 23f34fa4ba358671adab16773e79c17c92cbc870 first
appeared. Discussion:

- Prevent excess SimpleLruTruncate() deletion. Every core SLRU wraps around.
With the exception of pg_notify, the wrap point can fall in the middle of a
page. Account for this in the PagePrecedes callback specification and in
SimpleLruTruncate()'s use of said callback. Update each callback
implementation to fit the new specification. This changes
SerialPagePrecedesLogically() from the style of asyncQueuePagePrecedes() to
the style of CLOGPagePrecedes(). (Whereas pg_clog and pg_serial share a key
space, pg_serial is nothing like pg_notify.) The bug fixed here has the same
symptoms and user followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb.
Back-patch to 9.5 (all supported versions). Reviewed by Andrey Borodin and
(in earlier versions) by Tom Lane. Discussion:

Jeff Davis pushed:

- Documenation fixups for replication protocol. There is no CopyResponse
message; it should be CopyOutResponse. Also, if there is no WAL to stream,
the server does not immediately send a CommandComplete; it's a historical
timeline, so it will send a response tuple first. Discussion:

# Pending Patches

Andrey V. Lepikhov sent in another revision of a patch to remove unneeded
self-joins in a class of places where it is safe to do so.

Tom Lane sent in a patch intended to fix a bug that manifested As multiple hosts
in connection string failed to failover in non-hot standby mode by fixing some
of the retry and error logic for connecting.

David Fetter sent in another revision of a patch to surface popcount to SQL.

Andrey V. Lepikhov sent in another revision of a patch to add a bulk insert
interface to the FDW API and use same in the PostgreSQL FDW. This should speed
up bulk loads to tables with foreign partitions.

Masahiko Sawada and Bharath Rupireddy traded patches to avoid catalogue accesses
in conversion_error_callback.

Konstantin Knizhnik and Tomáš Vondra traded patches to implement compression for

Ian Barwick and Greg Sabino Mullane traded patches to help psql tab-complete
functions by including the data types of their arguments.

Mark Dilger sent in another revision of a patch to add contrib module
pg_amcheck, a command line interface for running amcheck's verifications against
tables and indexes.

Bharath Rupireddy sent in two more revisions of a patch to make it possible to
use parallel inserts in CTAS.

Anastasia Lubennikova sent in two more revisions of a patch to set
PD_ALL_VISIBLE and visibility map bits in COPY FREEZE.

Masahiko Sawada sent in a patch to implement buffer encryption to make sure the
kms patch would be workable with other components using an encryption key
managed by kmgr.

Simon Riggs sent in another revision of a patch to implement system-versioned
temporal tables.

Ian Barwick sent in a patch to fix has_column_privilege() with attnums and
non-existent columns by confirming the existence of a column even if the user
has the table-level privilege, otherwise the function will happily report the
user has privilege on a dropped or non-existent column if an invalid attnum is

Yugo Nagata sent in another revision of a patch to implement incremental view

Atsushi Torikoshi sent in another revision of a patch to add the plan type
(generic or custom) to pg_stat_statements.

Peter Smith sent in two more revisions of a patch to make it possible to use
background workers for tablesync.

Kyotaro HORIGUCHI sent in two more revisions of a patch to make it possible to
change the persistence (LOGGED/UNLOGGED) of a table without incurring a heap

Atsushi Torikoshi sent in another revision of a patch to make it possible to
collect memory contexts of the specified process via a new function,

John Naylor sent in a patch to remove references to the now-removed
replication_timeout GUC.

Hou Zhijie sent in two more revisions of a patch to add a Nullif case for

Justin Pryzby sent in another revision of a patch to pg_upgrade to add a test to
exercise binary compatibility.

Álvaro Herrera sent in another revision of a patch to set PROC_IN_SAFE_IC during

Tomáš Vondra sent in four more revisions of a patch to add bulk insert for
foreign tables.

Li Japin and Bharath Rupireddy traded patches to fix ALTER PUBLICATION...DROP
TABLE behaviour by arranging it so that when an entry is invalidated in
rel_sync_cache_publication_cb(), mark the pubactions to false and let
get_rel_sync_entry() recalculate the pubactions.

Takamichi Osumi sent in three more revisions of a patch to add a new wal_level
to disable WAL logging which is designed to make bulk loads faster with the
trade-off of leaving an unrecoverable cluster if it fails midway.

Bruce Momjian sent in three more revisions of a patch to implement key

DRU sent in three more revisions of a patch to add documentation about data page
checksums, and support checksum enable/disable in a running cluster.

Heikki Linnakangas and Andrey Borodin traded patches to add functions to
'pageinspect' to inspect GiST indexes.

Dilip Kumar sent in another revision of a patch to support custom compression
methods for tables.

Yuzuko Hosoya sent in a patch to make it possible to Release SPI plans for
referential integrity with DISCARD ALL, which will among other things reduce the
amount of memory used when creating or using foreign keys on tables with many

Stephen Frost sent in a patch to introduce an obsolete appendix to link old
terms to new docs.

Stephen Frost sent in another revision of a patch to use pre-fetching for
ANALYZE and bring the details logged for autoanalyze into line with those for

Michaël Paquier and Aleksey Kondratov traded patches to refactor the utility
statement options.

Peter Eisentraut sent in another revision of a patch to pageinspect that change
es the block number arguments to bigint to avoid possible overflows.

Tomáš Vondra sent in three more revisions of a patch to implement BRIN
multi-range indexes.

Heikki Linnakangas sent in two more revisions of a patch to move a few
ResourceOwnerEnlarge() calls for safety and clarity, and make resowners more
easily extensible by using a single array and hash, rather than one for each
type of object.

Kyotaro HORIGUCHI sent in a patch to fix some misuses of RelationNeedsWAL.

Dilip Kumar sent in another revision of a patch to ensure that
pg_is_wal_replay_paused waits for recovery to pause.

Kyotaro HORIGUCHI sent in another revision of a patch to move the stats
collector's temporary storage from files to shared memory.

Kyotaro HORIGUCHI sent in another revision of a patch to protect syscache from
bloating with negative cache entries by adding a CatCache expiration feature.

Pavel Stěhule sent in another revision of a patch to implement schema variables.

Li Japin sent in a patch to fix a typo in a comment on WalSndPrepareWrite.

Simon Riggs sent in a patch to make it possible to change an index's uniqueness
without validating it, and a way to do that validation separately.

Takayuki Tsunakawa sent in a patch to fix the size calculation for shmem TOC by
changing a couple of incorrect += assignments to = .

Peter Geoghegan sent in a patch to lower vacuum_cost_page_miss's default to 3.

Ian Barwick sent in another revision of a patch to add lock acquisition wait
start time to the pg_lock_status function.

Andy Fan sent in a patch to make cost_sort more accurate.

Masahiko Sawada sent in another revision of a patch to make it possible to do
transactions involving multiple postgres foreign servers.

Fujii Masao and Bharath Rupireddy traded patches to add a postgres_fdw function
to discard cached connections, along with both a postgres_fdw-specific and a
system-wide GUC, keep_connections.

Hou Zhijie sent in a patch to remove a stray apostrophe from a comment in

Álvaro Herrera sent in a patch to have VACUUM ignore processes doing CIC and RC
when computing the Xid horizon of tuples to remove.

Álvaro Herrera sent in a patch to increase the size of pg_commit_ts buffers.

David Zhang sent in a patch to update the tablespace documentation to keep it
consistent with the new table access method option for pgbench.

Iwata Aya sent in another revision of a patch to enable tracing for libpq.

Tomáš Vondra sent in two more revisions of a patch to cover expressions with
extended statistics.

Tom Lane sent in a patch to fix a wrong calculation in pull_varnos().

Thomas Munro sent in another revision of a patch to make it possible to get
pgbench to delay queries till connections are established.

Browse pgsql-announce by date

  From Date Subject
Next Message Toshiba via PostgreSQL Announce 2021-01-19 06:03:22 GridDB fdw 1.3 released
Previous Message Slotix s.r.o. via PostgreSQL Announce 2021-01-17 19:23:53 DBConvert Studio 2.0 released. Database migration and synchronization