Re: [HACKERS] Restricting maximum keep segments by repslots

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2019-06-27 14:22:56
Message-ID: 20190627162256.4f4872b8@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

Being interested by this feature, I did a patch review.

This features adds the GUC "max_slot_wal_keep_size". This is the maximum amount
of WAL that can be kept in "pg_wal" by active slots.

If the amount of WAL is superior to this limit, the slot is deactivated and
its status (new filed in pg_replication_slot) is set as "lost".


The patch v13-0003 does not apply on HEAD anymore.

The patch v13-0005 applies using "git am --ignore-space-change"

Other patches applies correctly.

Please, find attached the v14 set of patches rebased on master.


The documentation explains the GUC and related columns in "pg_replication_slot".

It reflects correctly the current behavior of the patch.


The patch implement what it described. It is easy to enable and disable. The
GUC name is describing correctly its purpose.

This feature is useful in some HA scenario where slot are required (eg. no
possible archiving), but where primary availability is more important than

In "pg_replication_slots" view, the new "wal_status" field is misleading.
Consider this sentence and the related behavior from documentation

<literal>keeping</literal> means that some of them are to be removed by the
next checkpoint.

"keeping" appears when the current checkpoint will delete some WAL further than
"current_lsn - max_slot_wal_keep_size", but still required by at least one slot.
As some WAL required by some slots will be deleted quite soon, probably before
anyone can react, "keeping" status is misleading here. We are already in the
red zone.

I would expect this "wal_status" to be:

- streaming: slot lag between 0 and "max_wal_size"
- keeping: slot lag between "max_wal_size" and "max_slot_wal_keep_size". the
slot actually protect some WALs from being deleted
- lost: slot lag superior of max_slot_wal_keep_size. The slot couldn't protect
some WAL from deletion

Documentation follows with:

The last two states are seen only when max_slot_wal_keep_size is

This is true with the current behavior. However, if "keeping" is set as soon as
te slot lag is superior than "max_wal_size", this status could be useful even
with "max_slot_wal_keep_size = -1". As soon as a slot is stacking WALs that
should have been removed by previous checkpoint, it "keeps" them.

Feature tests

I have played with various traffic shaping setup between nodes, to observe how
columns "active", "wal_status" and "remain" behaves in regard to each others

while true; do
sleep 0.3;
psql -p 5432 -AXtc "
select now(), active, restart_lsn, wal_status, pg_size_pretty(remain)
from pg_replication_slots
where slot_name='slot_limit_st'"

The primary is created using:

initdb -U postgres -D slot_limit_pr --wal-segsize=1

cat<<EOF >>slot_limit_pr/postgresql.conf
max_wal_size = 3MB
min_wal_size = 2MB
max_slot_wal_keep_size = 4MB
logging_collector = on
synchronous_commit = off

WAL activity is generated using a simple pgbench workload. Then, during
this activity, packets on loopback are delayed using:

tc qdisc add dev lo root handle 1:0 netem delay 140msec

Here is how the wal_status behave. I removed the timestamps, but the
record order is the original one:

t|1/7B116898|streaming|1872 kB
t|1/7B1A0000|lost|0 bytes
t|1/7B320000|keeping|0 bytes
t|1/7B780000|lost|0 bytes
t|1/7BB00000|keeping|0 bytes
t|1/7BE00000|keeping|0 bytes
t|1/7C100000|lost|0 bytes
t|1/7C400000|keeping|0 bytes
t|1/7C700000|lost|0 bytes
t|1/7CA40000|keeping|0 bytes
t|1/7CDE0000|lost|0 bytes
t|1/7D100000|keeping|0 bytes
t|1/7D400000|keeping|0 bytes
t|1/7D7C0000|keeping|0 bytes
t|1/7DB40000|keeping|0 bytes
t|1/7DE60000|lost|0 bytes
t|1/7E180000|keeping|0 bytes
t|1/7E500000|keeping|0 bytes
t|1/7E860000|lost|0 bytes
t|1/7EB80000|keeping|0 bytes
t|1/80800000|keeping|0 bytes
t|1/80900000|streaming|940 kB
t|1/80A00000|streaming|1964 kB

When increasing the network delay to 145ms, the slot has been lost for real.
Note that it has been shown as lost but active twice (during approx 0.6s) before
being deactivated.

t|1/85700000|streaming|2048 kB
t|1/85800000|keeping|0 bytes
t|1/85940000|lost|0 bytes
t|1/85AC0000|lost|0 bytes
f|1/85C40000|lost|0 bytes

Finally, at least once the following messages appeared in primary logs
**before** the "wal_status" changed from "keeping" to "streaming":

WARNING: some replication slots have lost required WAL segments

So the slot lost one WAL, but the standby has been able to catch-up anyway.

My humble opinion about these results:

* after many different tests, the status "keeping" appears only when "remain"
equals 0. In current implementation, "keeping" really adds no value...
* "remain" should be NULL if "max_slot_wal_keep_size=-1 or if the slot isn't
* the "lost" status should be a definitive status
* it seems related, but maybe the "wal_status" should be set as "lost"
only when the slot has been deactivate ?
* logs should warn about a failing slot as soon as it is effectively
deactivated, not before.

Attachment Content-Type Size
v14-0001-Add-WAL-relief-vent-for-replication-slots.patch text/x-patch 10.1 KB
v14-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch text/x-patch 12.6 KB
v14-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch text/x-patch 1.4 KB
v14-0004-TAP-test-for-the-slot-limit-feature.patch text/x-patch 7.1 KB
v14-0005-Documentation-for-slot-limit-feature.patch text/x-patch 4.5 KB
v14-0006-Check-removal-of-in-reading-segment-file.patch text/x-patch 2.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-06-27 14:28:40 Missing PG 12 stable branch
Previous Message Jesper Pedersen 2019-06-27 14:06:46 pg_receivewal documentation