RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "smithpb2250(at)gmail(dot)com" <smithpb2250(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-15 08:12:52
Message-ID: TYAPR01MB58661BA3BF38E9798E59AE14F5E19@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Horiguchi-san, Amit,

> > Yes, that would be ideal. But do you know why that is a must?
>
> I believe a graceful shutdown (fast and smart) of a replication set is expected to
> be in sync. Of course we can change the policy to allow walsnder to stop before
> confirming all WAL have been applied. However walsender doesn't have an idea
> of wheter the peer is intentionally delaying or not.

This mechanism was introduced by 985bd7[1], which was needed to support a
"clean" switchover. I think it is needed for physical replication, but it is not
clear for the logical case.

When the postmaster is stopped in fast or smart mode, we expected that all
modifications were received by secondary. This requirement seems to be not changed
from the initial commit.

Before 985bd7, the walsender exited just after sending the final WAL, which meant
that sometimes the last packet could not reach to secondary. So there was a possibility
of failing to reboot the primary as a new secondary because the new primary does
not have the last WAL record. To avoid the above walsender started waiting for
flush before exiting.

But in the case of logical replication, I'm not sure whether this limitation is
really needed or not. I think it may be OK that walsender exits without waiting,
in case of delaying applies. Because we don't have to consider the above issue
for logical replication.

[1]: https://github.com/postgres/postgres/commit/985bd7d49726c9f178558491d31a570d47340459

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-12-15 08:34:17 Re: pg_upgrade: Make testing different transfer modes easier
Previous Message Pavel Stehule 2022-12-15 08:03:12 Re: plpgsq_plugin's stmt_end() is not called when an error is caught