Re: persist logical slots to disk during shutdown checkpoint

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: persist logical slots to disk during shutdown checkpoint
Date: 2023-09-11 06:38:19
Message-ID: ZP6123z3n8LKM+vv@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 08, 2023 at 11:50:37AM +0530, Amit Kapila wrote:
> On Fri, Sep 8, 2023 at 10:08 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>
>>
>> + /*
>> + * This is used to track the last persisted confirmed_flush LSN value to
>> + * detect any divergence in the in-memory and on-disk values for the same.
>> + */
>>
>> "This value tracks is the last LSN where this slot's data has been
>> flushed to disk.
>>
>
> This makes the comment vague as this sounds like we are saving a slot
> corresponding to some LSN which is not the case. If you prefer this
> tone then we can instead say: "This value tracks the last
> confirmed_flush LSN flushed which is used during a checkpoint shutdown
> to decide if a logical slot's data should be forcibly flushed or not."

Okay, that looks like an improvement over the term "persisted".

>> This is used during a checkpoint shutdown to decide
>> if a logical slot's data should be forcibly flushed or not."
>>
>> Hmm. WAL senders are shut down *after* the checkpointer and *after*
>> the shutdown checkpoint is finished (see PM_SHUTDOWN and
>> PM_SHUTDOWN_2) because we want the WAL senders to acknowledge the
>> checkpoint record before shutting down the primary.
>>
>
> As per my understanding, this is not true for logical walsenders. As
> per code, while HandleCheckpointerInterrupts(), we call ShutdownXLOG()
> which sends a signal to walsender to stop and waits for it to stop.
> And only after that, did it write a shutdown checkpoint WAL record.
> After getting the InitStopping signal, walsender sets got_STOPPING
> flag. Then *logical* walsender ensures that it sends all the pending
> WAL and exits. What you have quoted is probably true for physical
> walsenders.

Hm, reminding me about this area.. This roots down to the handling of
WalSndCaughtUp in the send_data callback for logical or physical.
This is switched to true for logical WAL senders much earlier than
physical WAL senders, aka before the shutdown checkpoint begins in the
latter. What was itching me a bit is that the postmaster logic could
be made more solid. Logical and physical WAL senders are both marked
as BACKEND_TYPE_WALSND, but we don't actually check that the WAL
senders remaining at the end of PM_SHUTDOWN_2 are *not* connected to a
database. This would require a new BACKEND_TYPE_* perhaps, or perhaps
we're fine with the current state because we'll catch up problems in
the checkpointer if any WAL is generated while the shutdown checkpoint
is running anyway. Just something I got in mind, unrelated to this
patch.

>> In order to limit
>> the number of records to work on after a restart, what this patch is
>> proposing is an improvement. Perhaps it would be better to document
>> that we don't care about the potential concurrent activity of logical
>> WAL senders in this case and that the LSN we are saving at is a best
>> effort, meaning that last_saved_confirmed_flush is just here to reduce
>> the damage on a follow-up restart?
>
> Unless I am wrong, there shouldn't be any concurrent activity for
> logical walsenders. IIRC, it is a mandatory requirement for logical
> walsenders to stop before shutdown checkpointer to avoid panic error.
> We do handle logical walsnders differently because they can generate
> WAL during decoding.

Yeah. See above.

>> The comment added in
>> CheckPointReplicationSlots() goes in this direction, but perhaps this
>> potential concurrent activity should be mentioned?
>
> Sure, we can change it if required.

+ * We can flush dirty replication slots at regular intervals by any
+ * background process like bgwriter but checkpoint is a convenient location.

I still don't quite see a need to mention the bgwriter at all here..
That's just unrelated.

The comment block in CheckPointReplicationSlots() from v10 uses
"persist", but you mean "flush", I guess..
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hongxu Ma 2023-09-11 06:50:39 Re: PSQL error: total cell count of XXX exceeded
Previous Message Pavel Stehule 2023-09-11 06:33:42 Re: proposal: possibility to read dumped table's name from file