Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: mk(at)071(dot)ovh, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Date: 2021-07-14 23:10:26
Message-ID: CAMkU=1wfXqQ9yhD+vZ6+kSV=MiP2B=u=wMGEgqFGUVFT3pT73w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jul 13, 2021 at 10:12 PM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> At Tue, 13 Jul 2021 09:15:17 +0000, PG Bug reporting form <
> noreply(at)postgresql(dot)org> wrote in
> > We are using max_slot_wal_keep_size from Postgresql 13 to prevent master
> > from being killed by a lagging replication. It seems, that in our case,
> WAL
> > storage wasn't freed up after exceeding this parameter which resulted in
> a
> > replication failure. WAL which, as I believe, should have been freed up
> did
> > not seem to be needed by any other transaction at a time.
>
> Yeah, the max_slot_wal_keep_size is the maximum WAL size that
> replication slots are guaranteed to be able to keep files up to. It
> is not the size that replication slot are guaranteed not to keep WAL
> files beyond it. Addition to that, WAL removal happens only at the
> ending of a checkpoint so WAL files can grow up to
> max_slot_wal_keep_size plus checkpoint distance assuming an even load.
>
> > -- Configuration --
> > master & one replica - streaming replication using a slot
> > ~700GB available for pg_wal
> > max_slot_wal_keep_size = 600GB
> > min_wal_size = 20GB
> > max_wal_size = 40GB
> > default checkpoint_timeout = 5 minutes (no problem with checkpoints)
> > archiving is on and is catching up well
>
> Assuming an even load (or WAL speed) and 0.5 for
> checkpoint_completion_target, 40GB of max_wal_size causes checkpoints
> every 27GB (1706 segments) (*1) at longest (in the case where xlog
> checkpoint fires before timeout checkpoint).
>
> Thus with 600GB of max_slot_wal_keep_size, the maximum size of WAL
> files can reach 627GB, which size can even be exceeded if a sudden
> high-load is given.
>
> [1] checkpoint distance = max_wal_size / (1.0 +
> checkpoint_completion_target)
>
> > -- What happened --
> > Under heavy load (large COPY/INSERT transactions, loading hundreds of GB
> of
> > data), the replication started falling behind. Available space on pg_wal
> was
> > being reduced in the same rate as safe_slot
> > pg_replication_slot.safe_wal_size - as expected. At some point
> safe_wal_size
> > went negative and streaming stopped working. It wasn't a problem, because
> > replica started recovery from WAL archive. I expected that once the slot
> is
> > lost, WALs will be removed up to max_wal_size. This did not happen
> though.
> > It seems that Postgres tried to maintain something close to
> > max_slot_wal_keep_size (600GB) available, in case replica starts
> catching up
> > again. Over the time, there was no single transaction which would require
> > this much WAL to be kept. archiving wasn't behind either.
>
> Useless WAL files will be removd after a checkpoint runs.
>

They should be, but they are not. That is the bug. They just hang
around, checkpoint after checkpoint. Some of them do get cleaned up, to
make up for new ones created during that cycle. It treats
max_slot_wal_keep the same way it treats wal_keep_size (but only if a
"lost" slot is hanging around). If you drop the lost slot, only then does
it remove all the accumulated WAL at the next checkpoint.

Cheers,

Jeff

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-07-15 04:00:01 BUG #17106: Renaming system types is possible and it potentially leads to a crash
Previous Message Ibrar Ahmed 2021-07-14 19:16:31 Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails