RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-06-30 13:58:45
Message-ID: TYAPR01MB5866D5847512065BD9A64D1BF52AA@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

Thank you for giving comments!

> > > Sorry for the delay, I didn't had time to come back to it until this afternoon.
> >
> > No issues, everyone is busy:-).
> >
> > > I don't think that your analysis is correct. Slots are guaranteed to be
> > > stopped after all the normal backends have been stopped, exactly to avoid
> such
> > > extraneous records.
> > >
> > > What is happening here is that the slot's confirmed_flush_lsn is properly
> > > updated in memory and ends up being the same as the current LSN before the
> > > shutdown. But as it's a logical slot and those records aren't decoded, the
> > > slot isn't marked as dirty and therefore isn't saved to disk. You don't see
> > > that behavior when doing a manual checkpoint before (per your script
> comment),
> > > as in that case the checkpoint also tries to save the slot to disk but then
> > > finds a slot that was marked as dirty and therefore saves it.
> > >
>
> Here, why the behavior is different for manual and non-manual checkpoint?

I have analyzed more, and concluded that there are no difference between manual
and shutdown checkpoint.

The difference was whether the CHECKPOINT record has been decoded or not.
The overall workflow of this test was:

1. do INSERT
(2. do CHECKPOINT)
(3. decode CHECKPOINT record)
4. receive feedback message from standby
5. do shutdown CHECKPOINT

At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The stucktrace was:
standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot().

At step 4, the confirmed_flush of the slot was updated, but ReplicationSlotSave()
was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 and
3 are misssed, the dirty flag is not set and the change is still on the memory.

FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed and
the patch from Julien is not applied, the updated value will be discarded. This
is what I observed. The patch forces to save the logical slot at the shutdown
checkpoint, so the confirmed_lsn is save to disk at step 5.

> Can you please explain what led to updating the confirmed_flush in
> memory but not in the disk?

The code-level workflow was said above. The slot info is updated only after
decoding CHECKPOINT. I'm not sure the initial motivation, but I suspect we wanted
to reduce the number of writing to disk.

> BTW, have we ensured that discarding the
> additional records are already sent to the subscriber, if so, why for
> those records confirmed_flush LSN is not progressed?

In this case, the apply worker request the LSN which is greater than confirmed_lsn
via START_REPLICATION. Therefore, according to CreateDecodingContext(), the
walsender sends from the appropriate records, doesn't it? I think discarding is
not happened on subscriber.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Seino Yuki 2023-06-30 14:15:54 SPI isolation changes
Previous Message Matthias van de Meent 2023-06-30 12:26:44 Extensible storage manager API - SMGR hook Redux