Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-07-17 12:49:44
Message-ID: CAA4eK1KRDcsyFBkwwv4obMup8Q0HzTU6+YfP8Kk2izoNvSvmkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 30, 2023 at 7:29 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> I have analyzed more, and concluded that there are no difference between manual
> and shutdown checkpoint.
>
> The difference was whether the CHECKPOINT record has been decoded or not.
> The overall workflow of this test was:
>
> 1. do INSERT
> (2. do CHECKPOINT)
> (3. decode CHECKPOINT record)
> 4. receive feedback message from standby
> 5. do shutdown CHECKPOINT
>
> At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The stucktrace was:
> standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot().
>
> At step 4, the confirmed_flush of the slot was updated, but ReplicationSlotSave()
> was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 and
> 3 are misssed, the dirty flag is not set and the change is still on the memory.
>
> FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed and
> the patch from Julien is not applied, the updated value will be discarded. This
> is what I observed. The patch forces to save the logical slot at the shutdown
> checkpoint, so the confirmed_lsn is save to disk at step 5.
>

I see your point but there are comments in walsender.c which indicates
that we also wait for step-5 to get replicated. See [1] and comments
atop walsender.c. If this is true then we don't need a special check
as you have in patch 0003 or at least it doesn't seem to be required
in all cases.

[1] -
/*
* When SIGUSR2 arrives, we send any outstanding logs up to the
* shutdown checkpoint record (i.e., the latest record), wait for
* them to be replicated to the standby, and exit. ...
*/

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-07-17 13:00:50 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message Aleksander Alekseev 2023-07-17 12:48:58 Re: Protect extension' internal tables - how?