Re: Exit walsender before confirming remote flush in logical replication

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2022-12-22 11:59:34
Message-ID: CAExHW5v6Q4SFsqku2V3UHyv_pbdaX0Lt-uzb_L72CJaY4r6wvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
> (I added Amit as CC because we discussed in another thread)
>
> This is a fork thread from time-delayed logical replication [1].
> While discussing, we thought that we could extend the condition of walsender shutdown[2][3].
>
> Currently, walsenders delay the shutdown request until confirming all sent data
> are flushed on remote side. This condition was added in 985bd7[4], which is for
> supporting clean switchover. Supposing that there is a primary-secondary
> physical replication system, and do following steps. If any changes are come
> while step 2 but the walsender does not confirm the remote flush, the reboot in
> step 3 may be failed.
>
> 1. Stops primary server.
> 2. Promotes secondary to new primary.
> 3. Reboot (old)primary as new secondary.
>
> In case of logical replication, however, we cannot support the use-case that
> switches the role publisher <-> subscriber. Suppose same case as above, additional
> transactions are committed while doing step2. To catch up such changes subscriber
> must receive WALs related with trans, but it cannot be done because subscriber
> cannot request WALs from the specific position. In the case, we must truncate all
> data in new subscriber once, and then create new subscription with copy_data
> = true.
>
> Therefore, I think that we can ignore the condition for shutting down the
> walsender in logical replication.
>
> This change may be useful for time-delayed logical replication. The walsender
> waits the shutdown until all changes are applied on subscriber, even if it is
> delayed. This causes that publisher cannot be stopped if large delay-time is
> specified.

I think the current behaviour is an artifact of using the same WAL
sender code for both logical and physical replication.

I agree with you that the logical WAL sender need not wait for all the
WAL to be replayed downstream.

I have not reviewed the patch though.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2022-12-22 11:59:41 Re: Optimization issue of branching UNION ALL
Previous Message Bharath Rupireddy 2022-12-22 11:33:35 Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures