RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-07-21 07:30:14
Message-ID: TYCPR01MB587049C4F11BF7EB4083D895F53FA@TYCPR01MB5870.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

> Based on the above, we are considering that we delay the timing of shutdown for
> logical walsenders. The preliminary workflow is:
>
> 1. When logical walsenders receives siginal from checkpointer, it consumes all
> of WAL records, change its state into WALSNDSTATE_STOPPING, and stop
> doing
> anything.
> 2. Then the checkpointer does the shutdown checkpoint
> 3. After that postmaster sends signal to walsenders, same as current
> implementation.
> 4. Finally logical walsenders process the shutdown checkpoint record and update
> the
> confirmed_lsn after the acknowledgement from subscriber.
> Note that logical walsenders don't have to send a shutdown checkpoint record
> to subscriber but following keep_alive will help us to increment the
> confirmed_lsn.
> 5. All tasks are done, they exit.
>
> This mechanism ensures that the confirmed_lsn of active slots is same as the
> current
> WAL location of old publisher, so that 0003 patch would become more simpler.
> We would not have to calculate the acceptable difference anymore.
>
> One thing we must consider is that any WALs must not be generated while
> decoding
> the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
> SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
> it, so the change may be acceptable. Thought?

I've implemented the ideas from my previous proposal, PSA another patch set.
Patch 0001 introduces the state WALSNDSTATE_STOPPING to logical walsenders. The
workflow remains largely the same as described in my previous post, with the
following additions:

* A flag has been added to track whether all the WALs have been flushed. The
logical walsender can only exit after the flag is set. This ensures that all
WALs are flushed before the termination of the walsender.
* Cumulative statistics are now forcibly written before changing the state.
While the previous involved reporting stats upon process exit, the current approach
must report earlier due to the checkpointer's termination timing. See comments
in CheckpointerMain() and atop pgstat_before_server_shutdown().
* At the end of processes, slots are now saved to disk.

Patch 0002 adds --include-logical-replication-slots option to pg_upgrade,
not changed from previous set.

Patch 0003 adds a check function, which becomes simpler.
The previous version calculated the "acceptable" difference between confirmed_lsn
and the current WAL position. This was necessary because shutdown records could
not be sent to subscribers, creating a disparity in these values. However, this
approach had drawbacks, such as needing adjustments if record sizes changed.

Now, the record can be sent to subscribers, so the hacking is not needed anymore,
at least in the context of logical replication. The consistency is now maintained
by the logical walsenders, so slots created by the backend could not be.
We must consider what should be...

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
0001-Send-shutdown-checkpoint-record-to-subscriber.patch application/octet-stream 3.0 KB
0002-pg_upgrade-Add-include-logical-replication-slots-opt.patch application/octet-stream 34.0 KB
0003-pg_upgrade-Add-check-function-for-include-logical-re.patch application/octet-stream 11.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2023-07-21 09:47:46 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Amit Kapila 2023-07-21 07:24:21 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication