RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Julien Rouhaud' <rjuju123(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-04-14 10:30:27
Message-ID: TYAPR01MB5866D67857191A7F75A40916F5999@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Julien,

> Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

> I don't think that your analysis is correct. Slots are guaranteed to be
> stopped after all the normal backends have been stopped, exactly to avoid such
> extraneous records.
>
> What is happening here is that the slot's confirmed_flush_lsn is properly
> updated in memory and ends up being the same as the current LSN before the
> shutdown. But as it's a logical slot and those records aren't decoded, the
> slot isn't marked as dirty and therefore isn't saved to disk. You don't see
> that behavior when doing a manual checkpoint before (per your script comment),
> as in that case the checkpoint also tries to save the slot to disk but then
> finds a slot that was marked as dirty and therefore saves it.
>
> In your script's scenario, when you restart the server the previous slot data
> is restored and the confirmed_flush_lsn goes backward, which explains those
> extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

> It's probably totally harmless to throw away that value for now (and probably
> also doesn't lead to crazy amount of work after restart, I really don't know
> much about the logical slot code), but clearly becomes problematic with your
> usecase. One easy way to fix this is to teach the checkpoint code to force
> saving the logical slots to disk even if they're not marked as dirty during a
> shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
> interfere with the cfbot). With this patch applied I reliably only see a final
> shutdown checkpoint record with your scenario.
>
> Now such a change will make shutdown a bit more expensive when using logical
> replication, even if in 99% of cases you will not need to save the
> confirmed_flush_lsn value, so I don't know if that's acceptable or not.

In any case we these records must be advanced. IIUC, currently such records are
read after rebooting but ingored, and this patch just skips them. I have not measured,
but there is a possibility that is not additional overhead, just a trade-off.

Currently I did not come up with another solution, so I have included your patch.
Please see 0002.

Additionally, I added a checking functions in 0003.
According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
Therefore, the function ensures that the difference between current insert position
and confirmed_flush_lsn is less than (above + page header).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v7-0001-pg_upgrade-Add-include-logical-replication-slots-.patch application/octet-stream 24.1 KB
v7-0002-Always-persist-to-disk-logical-slots-during-a-shu.patch application/octet-stream 4.8 KB
v7-0003-pg_upgrade-Add-check-function-for-include-logical.patch application/octet-stream 8.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2023-04-14 10:59:27 OOM in hash join
Previous Message Jim Jones 2023-04-14 10:05:25 Re: Tab completion for AT TIME ZONE