Re: persist logical slots to disk during shutdown checkpoint

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: persist logical slots to disk during shutdown checkpoint
Date: 2023-09-08 06:20:37
Message-ID: CAA4eK1+P=U8L8Vjor_bh6W1+mJq-E9zCoODzqez3wPX6uxtrPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 8, 2023 at 10:08 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
>
> + /*
> + * This is used to track the last persisted confirmed_flush LSN value to
> + * detect any divergence in the in-memory and on-disk values for the same.
> + */
>
> "This value tracks is the last LSN where this slot's data has been
> flushed to disk.
>

This makes the comment vague as this sounds like we are saving a slot
corresponding to some LSN which is not the case. If you prefer this
tone then we can instead say: "This value tracks the last
confirmed_flush LSN flushed which is used during a checkpoint shutdown
to decide if a logical slot's data should be forcibly flushed or not."

>
> This is used during a checkpoint shutdown to decide
> if a logical slot's data should be forcibly flushed or not."
>
> Hmm. WAL senders are shut down *after* the checkpointer and *after*
> the shutdown checkpoint is finished (see PM_SHUTDOWN and
> PM_SHUTDOWN_2) because we want the WAL senders to acknowledge the
> checkpoint record before shutting down the primary.
>

As per my understanding, this is not true for logical walsenders. As
per code, while HandleCheckpointerInterrupts(), we call ShutdownXLOG()
which sends a signal to walsender to stop and waits for it to stop.
And only after that, did it write a shutdown checkpoint WAL record.
After getting the InitStopping signal, walsender sets got_STOPPING
flag. Then *logical* walsender ensures that it sends all the pending
WAL and exits. What you have quoted is probably true for physical
walsenders.

>
> In order to limit
> the number of records to work on after a restart, what this patch is
> proposing is an improvement. Perhaps it would be better to document
> that we don't care about the potential concurrent activity of logical
> WAL senders in this case and that the LSN we are saving at is a best
> effort, meaning that last_saved_confirmed_flush is just here to reduce
> the damage on a follow-up restart?
>

Unless I am wrong, there shouldn't be any concurrent activity for
logical walsenders. IIRC, it is a mandatory requirement for logical
walsenders to stop before shutdown checkpointer to avoid panic error.
We do handle logical walsnders differently because they can generate
WAL during decoding.

>
> The comment added in
> CheckPointReplicationSlots() goes in this direction, but perhaps this
> potential concurrent activity should be mentioned?
>

Sure, we can change it if required.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-09-08 06:24:28 Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)
Previous Message Peter Geoghegan 2023-09-08 06:18:26 Re: Eager page freeze criteria clarification