Re: Allow async standbys wait for sync replication

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow async standbys wait for sync replication
Date: 2022-03-02 04:17:09
Message-ID: CALj2ACULc_2TbC44fdFbpPCJ-AmvixdLY9z=LgMfz07QXx9-bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 2, 2022 at 2:57 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Tue, Mar 01, 2022 at 11:09:57PM +0530, Bharath Rupireddy wrote:
> > On Tue, Mar 1, 2022 at 10:35 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >> Yes, perhaps the synchronous replication framework will need to alert WAL
> >> senders when the syncrep LSN advances so that the WAL is sent to the async
> >> standbys. I'm glossing over the details, but I think that should be the
> >> general direction.
> >
> > It's doable. But we can't avoid async walsenders waiting for the flush
> > LSN even if we take the SyncRepReleaseWaiters() approach right? I'm
> > not sure (at this moment) what's the biggest advantage of this
> > approach i.e. (1) backends waking up walsenders after flush lsn is
> > updated vs (2) walsenders keep looking for the new flush lsn.
>
> I think there are a couple of advantages. For one, spinning is probably
> not the best from a resource perspective.

Just to be on the same page - by spinning do you mean - the async
walsender waiting for the sync flushLSN in a for-loop with
WaitLatch()?

> There is no guarantee that the
> desired SendRqstPtr will ever be synchronously replicated, in which case
> the WAL sender would spin forever.

The async walsenders will not exactly wait for SendRqstPtr LSN to be
the flush lsn. Say, SendRqstPtr is 100 and the current sync FlushLSN
is 95, they will have to wait until FlushLSN moves ahead of
SendRqstPtr i.e. SendRqstPtr <= FlushLSN. I can't think of a scenario
(right now) that doesn't move the sync FlushLSN at all. If there's
such a scenario, shouldn't it be treated as a sync replication bug?

> Also, this approach might fit in better
> with the existing synchronous replication framework. When a WAL sender
> realizes that it can't send up to the current "flush" LSN because it's not
> synchronously replicated, it will request to be alerted when it is.

I think you are referring to the way a backend calls SyncRepWaitForLSN
and waits until any one of the walsender sets syncRepState to
SYNC_REP_WAIT_COMPLETE in SyncRepWakeQueue. Firstly, SyncRepWaitForLSN
blocking i.e. the backend spins/waits in for (;;) loop until its
syncRepState becomes SYNC_REP_WAIT_COMPLETE. The backend doesn't do
any other work but waits. So, spinning isn't avoided completely.

Unless, I'm missing something, the existing syc repl queue
(SyncRepQueue) mechanism doesn't avoid spinning in the requestors
(backends) SyncRepWaitForLSN or in the walsenders SyncRepWakeQueue.

> In the
> meantime, it can send up to the latest syncrep LSN so that the async
> standby is as up-to-date as possible.

Just to be clear, there can exist the following scenarios:
Firstly, SendRqstPtr is up to which a walsender can send WAL, it's not the

scenario 1:
async SendRqstPtr is 100, sync FlushLSN is 95 - async standbys will
wait until the FlushLSN moves ahead, once SendRqstPtr <= FlushLSN, it
sends out the WAL.

scenario 2:
async SendRqstPtr is 105, sync FlushLSN is 110 - async standbys will
not wait, it just sends out the WAL up to SendRqstPtr i.e. LSN 105.

scenario 3, same as scenario 2 but SendRqstPtr and FlushLSN is same:
async SendRqstPtr is 105, sync FlushLSN is 105 - async standbys will
not wait, it just sends out the WAL up to SendRqstPtr i.e. LSN 105.

This way, the async standbys are always as up-to-date as possible with
the sync FlushLSN.

Are you referring to any other scenarios?

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-03-02 04:47:46 PG DOCS - logical replication filtering
Previous Message Masahiko Sawada 2022-03-02 04:02:45 Re: Add the replication origin name and commit-LSN to logical replication worker errcontext