Re: Allow async standbys wait for sync replication

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: nathandbossart(at)gmail(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, satyanarlapuram(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Allow async standbys wait for sync replication
Date: 2022-03-01 07:34:31
Message-ID: 20220301.163431.1826638724406024793.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Now I understand what "async" mean here..)

At Mon, 28 Feb 2022 22:05:28 -0800, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> On Tue, Mar 01, 2022 at 11:10:09AM +0530, Bharath Rupireddy wrote:
> > On Tue, Mar 1, 2022 at 12:27 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >> My feedback is specifically about this behavior. I don't think we should
> >> spin in XLogSend*() waiting for an LSN to be synchronously replicated. I
> >> think we should just choose the SendRqstPtr based on what is currently
> >> synchronously replicated.
> >
> > Do you mean something like the following?
> >
> > /* Main loop of walsender process that streams the WAL over Copy messages. */
> > static void
> > WalSndLoop(WalSndSendDataCallback send_data)
> > {
> > /*
> > * Loop until we reach the end of this timeline or the client requests to
> > * stop streaming.
> > */
> > for (;;)
> > {
> > if (am_async_walsender && there_are_sync_standbys)
> > {
> > XLogRecPtr SendRqstLSN;
> > XLogRecPtr SyncFlushLSN;
> >
> > SendRqstLSN = GetFlushRecPtr(NULL);
> > LWLockAcquire(SyncRepLock, LW_SHARED);
> > SyncFlushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> > LWLockRelease(SyncRepLock);
> >
> > if (SendRqstLSN > SyncFlushLSN)
> > continue;
> > }

The current trend is energy-savings. We never add a "wait for some
fixed time then exit if the condition makes, otherwise repeat" loop
for this kind of purpose where there's no guarantee that the loop
exits quite shortly. Concretely we ought to rely on condition
variables to do that.

> Not quite. Instead of "continue", I would set SendRqstLSN to SyncFlushLSN
> so that the WAL sender only sends up to the current synchronously

I'm not sure, but doesn't that makes walsender falsely believes it
have caught up to the bleeding edge of WAL?

> replicated LSN. TBH there are probably other things that need to be
> considered (e.g., how do we ensure that the WAL sender sends the rest once
> it is replicated?), but I still think we should avoid spinning in the WAL
> sender waiting for WAL to be replicated.

It seems to me it would be something similar to
SyncRepReleaseWaiters(). Or it could be possible to consolidate this
feature into the function, I'm not sure, though.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-03-01 07:45:48 Re: Allow file inclusion in pg_hba and pg_ident files
Previous Message Yura Sokolov 2022-03-01 07:24:22 Re: BufferAlloc: don't take two simultaneous locks