BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: mk(at)071(dot)ovh
Subject: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Date: 2021-07-13 09:15:17
Message-ID: 17103-004130e8f27782c9@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17103
Logged by: Marcin Krupowicz
Email address: mk(at)071(dot)ovh
PostgreSQL version: 13.3
Operating system: CentOS 7.6
Description:

Hi,

After the replication fell behind and the lag exceeded
max_slot_wal_keep_size, WAL on master were not removed. It seems that
Postgres tried to maintain max_slot_wal_keep_size worth of segments. Please
find the details below (sligthly redacted version of what I wrote here:
https://stackoverflow.com/questions/68314222/replication-lag-exceeding-max-slot-wal-keep-size-wal-segments-not-removed).

-- Summary --
We are using max_slot_wal_keep_size from Postgresql 13 to prevent master
from being killed by a lagging replication. It seems, that in our case, WAL
storage wasn't freed up after exceeding this parameter which resulted in a
replication failure. WAL which, as I believe, should have been freed up did
not seem to be needed by any other transaction at a time.

-- Configuration --
master & one replica - streaming replication using a slot
~700GB available for pg_wal
max_slot_wal_keep_size = 600GB
min_wal_size = 20GB
max_wal_size = 40GB
default checkpoint_timeout = 5 minutes (no problem with checkpoints)
archiving is on and is catching up well

-- What happened --
Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of
data), the replication started falling behind. Available space on pg_wal was
being reduced in the same rate as safe_slot
pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size
went negative and streaming stopped working. It wasn't a problem, because
replica started recovery from WAL archive. I expected that once the slot is
lost, WALs will be removed up to max_wal_size. This did not happen though.
It seems that Postgres tried to maintain something close to
max_slot_wal_keep_size (600GB) available, in case replica starts catching up
again. Over the time, there was no single transaction which would require
this much WAL to be kept. archiving wasn't behind either.

Amount of free space on pg_wal was more or less 70GB for most of the time,
however at some point, during heavy autovacuuming, it dipped to 0 :( This is
when PG crashed and (auto-recovered soon after). After getting back up,
there was 11GB left on pg_wal and no transaction running, no loading. This
lasted for hours. During this time replica finally caught up from the
archive and restored the replication with no delay. None of the WALs were
removed. I manually run checkpoint but it did not clear any WALs. I finally
restarted Postgresql and during the restarting pg_wal were finally
cleared.

Again - why PG did not clear WAL? WALs, even more clearly, were not needed
by any process.

Many thanks,
-- Marcin

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ibrar Ahmed 2021-07-13 11:07:41 Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Previous Message Laurenz Albe 2021-07-13 05:52:20 Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)