RE: Exit walsender before confirming remote flush in logical replication

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Exit walsender before confirming remote flush in logical replication
Date: 2023-01-16 11:09:24
Message-ID: TYAPR01MB58665435C61D6CDBA67CA644F5C19@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Horiguchi-san,

> If I'm grabbing the discussion here correctly, in my memory, it is
> because: physical replication needs all records that have written on
> primary are written on standby for switchover to succeed. It is
> annoying that normal shutdown occasionally leads to switchover
> failure. Thus WalSndDone explicitly waits for remote flush/write
> regardless of the setting of synchronous_commit.

AFAIK the condition (sentPtr == replicatedPtr) seemed to be introduced for the purpose[1].
You meant to say that the conditon (!pq_is_send_pending()) has same motivation, right?

> Thus apply delay
> doesn't affect shutdown (AFAICS), and that is sufficient since all the
> records will be applied at the next startup.

I was not clear the word "next startup", but I agreed that we can shut down the
walsender in case of recovery_min_apply_delay > 0 and synchronous_commit = remote_apply.
The startup process will be not terminated even if the primary crashes, so I
think the process will apply the transaction sooner or later.

> In logical replication apply preceeds write and flush so we have no
> indication whether a record is "replicated" to standby by other than
> apply LSN. On the other hand, logical recplication doesn't have a
> business with switchover so that assurarance is useless. Thus I think
> we can (practically) ignore apply_lsn at shutdown. It seems subtly
> irregular, though.

Another consideration is that the condition (!pq_is_send_pending()) ensures that
there are no pending messages, including other packets. Currently we force walsenders
to clean up all messages before shutting down, even if it is a keepalive one.
I cannot have any problems caused by this, but I can keep the condition in case of
logical replication.

I updated the patch accordingly. Also, I found that the previous version
did not work well in case of streamed transactions. When a streamed transaction
is committed on publisher but the application is delayed on subscriber, the
process sometimes waits until there is no pending write. This is done in
ProcessPendingWrites(). I added another termination path in the function.

[1]: https://github.com/postgres/postgres/commit/985bd7d49726c9f178558491d31a570d47340459

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v4-0001-Exit-walsender-before-confirming-remote-flush-in-.patch application/octet-stream 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-01-16 11:12:10 Re: pgsql: Doc: add XML ID attributes to <sectN> and <varlistentry> tags.
Previous Message Hayato Kuroda (Fujitsu) 2023-01-16 11:08:30 RE: Exit walsender before confirming remote flush in logical replication