KeepLogSeg needs some fixes on behavior

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: KeepLogSeg needs some fixes on behavior
Date: 2023-02-15 02:59:44
Message-ID: 20230215.115944.1764274401034441602.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is a derived thread form [1], that discusses some subtle
behaviors of KeepLogSeg.

1: https://www.postgresql.org/message-id/20230213194131.hgzs6ropcvhda5w3@awork3.anarazel.de

At Mon, 13 Feb 2023 11:41:31 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote
> Hi,
>
> On 2023-02-13 15:45:49 +0900, Kyotaro Horiguchi wrote:
> > This seems to have a thin connection to the issue, but.
>
> I was worried that the changes could lead us to removing WAL without
> max_slot_wal_keep_size set.
>
>
> > > It seems decidedly not great to not log at least a debug1 (but probably it
> > > should be LOG) message when KeepLogSeg() decides to limit based on
> > > max_slot_wal_keep_size.
> >
> > It's easy to do that, but that log is highly accompanied by a LOG line
> > "terminating process %d to release replication slot \"%s\"". I don't
> > mind adding it if it is a DEBUGx.
>
> My problem with that is that we might *NOT* see those log messages for some
> reason, but that that's impossible to debug as-is. And even if we see them,
> it's not that easy to figure out by how much we were over
> max_slot_wal_keep_size, because we always report it in the context of a
> specific slot.

Since 551aa6b7b9, InvalidatePossiblyObsoleteSlot() emits the following
detail message in that case for both "terminating" and "invalidating"
messages.

errdetail("The slot's restart_lsn %X/%X exceeds the limit by %llu bytes.",
LSN_FORMAT_ARGS(restart_lsn),
(unsigned long long) (oldestLSN - restart_lsn))

Where oldestLSN is the cutoff LSN by KeepLogSeg().

> Removing WAL that's still needed is a *very* severe operation. Emitting an
> additional line in case it happens isn't a problem.

Totally agreed about the severity. The message above doesn't
explicitly say the source of the cutoff LSN but the only possible
source is max_slot_wal_keep_size. I think that DEBUG1 is appropriate
for the message from KeepLogSeg(), especially given how often we see
it.

> > > It feels wrong to subtract max_slot_wal_keep_size from recptr - that's the end
> > > of the checkpoint record. Given that we, leaving max_slot_wal_keep_size aside,
> > > only actually remove WAL if older than the segment that RedoRecPtr (the
> > > logical start of the checkpoint) is in. If the checkpoint is large we'll end
> > > up removing replication slots even though they potentially would only have
> > > retained one additional WAL segment.
> >
> > I think that it is a controversial part, but that variable is defined
> > the similar way to wal_keep_size. And I think that all max_wal_size,
> > wal_keep_size and max_slot_wal_keep_size being defined with the same
> > base LSN makes things simpler for users (also for developers).
> > Regardless of checkpoint length, if slots get frequently invalidated,
> > the setting should be considered to be too small for the system
> > requirements.
>
> I think it's bad that we define wal_keep_size, max_slot_wal_keep_size that
> way. I don't think bringing max_wal_size into this is useful, as it influences
> different things.

In my faint memory, when wal_keep_segments was switched to
wal_keep_size, in the first cut patch, I translated the latter to the
former by rounding up manner but it was rejected and ended up with the
formula we have now.

Speaking of max_slot_wal_keep_size, I think it depends on how we
interpret the variable. If we see it as the minimum amount to ensure,
then we should round it up. But if we see it as the maximum amount
that can't be exceeded, then we would round it down like we do
now. However, I also think that the "max" prefix could imply something
about the upper limit.

> > > Isn't it problematic to use ConvertToXSegs() to implement
> > > max_slot_wal_keep_size, given that it rounds *down*? Particularly for a large
> > > wal_segment_size that'd afaict lead to being much more aggressive invalidating
> > > slots.
> >
> > I think max_slot_wal_keep_size is, like max_wal_size for checkpoints,
> > a safeguard for slots not to fill-up WAL directory. Thus they both are
> > rounded down. If you have 1GB WAL directory and set wal_segment_size
> > to 4192MB, I don't see it a sane setup. But if segment size is smaller
> > than one hundredth of max_wal_size, that difference won't matter for
> > anyone. But anyway, it's a pain in the a.. that the both of them (and
> > wal_keep_size) don't work in a immediate manner, though..
>
> It doesn't matter a lot for 16MB segments, but with 1GB segments it's a
> different story.
>
> To me the way it's done now is a bug, one that can in extreme circumstances
> lead to data loss.

Where do we lose data? This behavior won't cause primary to lose any
data. Standby is unable to continue replication, but that's just how
it's set up. If we were to round up during the size conversion, we
might run out of storage capacity and end up with a PANIC. If that
were to happen, would it be considered a bug? Whether we round up or
round down, converting to the segment size is necessary, and I think
it's only natural that having too little margin in the settings could
lead to trouble.

> > > Also, why do we do something as expensive as
> > > InvalidateObsoleteReplicationSlots() even when max_slot_wal_keep_size had no
> > > effect?
> >
> > Indeed. Maybe we didn't regard that process as complex at start? I
> > think we can compare the cutoff segno against
> > XLogGetReplicationSlotMinimumLSN() before entering the loop over
> > slots.
>
> That'd be better, but I'd probably go further, and really gate it on
> max_slot_wal_keep_size having had an effect.

I think that would be okay. Should we make KeepLogSeg() return whether
the slot has been invalidated or not?

- static void
+ static bool
KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)

> > Thus I think there's room for the following improvements.
> >
> > - Prevent KeepLogSeg from returning 0.
> >
> > - Add DEBUG log to KeepLogSeg emitted when max_slot_wal_keep_size affects.
> >
> > - Check against minimum slot LSN before actually examining through the
> > slots in Invalidateobsoletereplicationslots.
> >
> > I'm not sure about the second item but the others seem back-patchable.
> >
> > If we need to continue further discussion, will need another
> > thread. Anyway I'll come up with the patch for the above three items.
>
> Yep, probably a good idea to start another thread.
>
> There's also https://www.postgresql.org/message-id/20220223014855.4lsddr464i7mymk2%40alap3.anarazel.de
> that unfortunately nobody replied to.

Ugg. I'll revisit it later..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2023-02-15 03:01:37 RE: Support logical replication of DDLs
Previous Message Amit Kapila 2023-02-15 02:33:57 Re: Perform streaming logical transactions by background workers and parallel apply