Re: Allow async standbys wait for sync replication

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow async standbys wait for sync replication
Date: 2022-03-01 17:39:57
Message-ID: CALj2ACUdNd=vVTOdz=UMuWxTVJgzsCUhOwJ6mMsXK150Ec-QOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 1, 2022 at 10:35 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Tue, Mar 01, 2022 at 04:34:31PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 28 Feb 2022 22:05:28 -0800, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> >> replicated LSN. TBH there are probably other things that need to be
> >> considered (e.g., how do we ensure that the WAL sender sends the rest once
> >> it is replicated?), but I still think we should avoid spinning in the WAL
> >> sender waiting for WAL to be replicated.
> >
> > It seems to me it would be something similar to
> > SyncRepReleaseWaiters(). Or it could be possible to consolidate this
> > feature into the function, I'm not sure, though.
>
> Yes, perhaps the synchronous replication framework will need to alert WAL
> senders when the syncrep LSN advances so that the WAL is sent to the async
> standbys. I'm glossing over the details, but I think that should be the
> general direction.

It's doable. But we can't avoid async walsenders waiting for the flush
LSN even if we take the SyncRepReleaseWaiters() approach right? I'm
not sure (at this moment) what's the biggest advantage of this
approach i.e. (1) backends waking up walsenders after flush lsn is
updated vs (2) walsenders keep looking for the new flush lsn.

> >> My feedback is specifically about this behavior. I don't think we should
> >> spin in XLogSend*() waiting for an LSN to be synchronously replicated. I
> >> think we should just choose the SendRqstPtr based on what is currently
> >> synchronously replicated.
> >
> > Do you mean something like the following?
> >
> > /* Main loop of walsender process that streams the WAL over Copy messages. */
> > static void
> > WalSndLoop(WalSndSendDataCallback send_data)
> > {
> > /*
> > * Loop until we reach the end of this timeline or the client requests to
> > * stop streaming.
> > */
> > for (;;)
> > {
> > if (am_async_walsender && there_are_sync_standbys)
> > {
> > XLogRecPtr SendRqstLSN;
> > XLogRecPtr SyncFlushLSN;
> >
> > SendRqstLSN = GetFlushRecPtr(NULL);
> > LWLockAcquire(SyncRepLock, LW_SHARED);
> > SyncFlushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> > LWLockRelease(SyncRepLock);
> >
> > if (SendRqstLSN > SyncFlushLSN)
> > continue;
> > }
>
> Not quite. Instead of "continue", I would set SendRqstLSN to SyncFlushLSN
> so that the WAL sender only sends up to the current synchronously
> replicated LSN. TBH there are probably other things that need to be
> considered (e.g., how do we ensure that the WAL sender sends the rest once
> it is replicated?), but I still think we should avoid spinning in the WAL
> sender waiting for WAL to be replicated.

I did some more analysis on the above point: we can let
XLogSendPhysical know up to which it can send WAL (SendRqstLSN). But,
XLogSendLogical reads the WAL using XLogReadRecord mechanism with
read_local_xlog_page page_read callback to which we can't really say
SendRqstLSN. May be we have to do something like below:

XLogSendPhysical:
/* Figure out how far we can safely send the WAL. */
if (am_async_walsender && there_are_sync_standbys)
{
LWLockAcquire(SyncRepLock, LW_SHARED);
SendRqstPtr = WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH];
LWLockRelease(SyncRepLock);
}
/* Existing code path to determine SendRqstPtr */
else if (sendTimeLineIsHistoric)
{
}
else if (am_cascading_walsender)
{
}
else
{
/*
* Streaming the current timeline on a primary.
}

XLogSendLogical:
if (am_async_walsender && there_are_sync_standbys)
{
XLogRecPtr SendRqstLSN;
XLogRecPtr SyncFlushLSN;

SendRqstLSN = GetFlushRecPtr(NULL);
LWLockAcquire(SyncRepLock, LW_SHARED);
SyncFlushLSN = WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH];
LWLockRelease(SyncRepLock);

if (SendRqstLSN > SyncFlushLSN)
return;
}

On Tue, Mar 1, 2022 at 7:35 AM Hsu, John <hsuchen(at)amazon(dot)com> wrote:
> > I too observed this once or twice. It looks like the walsender isn't
> > detecting postmaster death in for (;;) with WalSndWait. Not sure if
> > this is expected or true with other wait-loops in walsender code. Any
> > more thoughts here?
>
> Unfortunately I haven't had a chance to dig into it more although iirc I hit it fairly often.

I think I got what the issue is. Below does the trick.
if (got_STOPPING)
proc_exit(0);

* If the server is shut down, checkpointer sends us
* PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited.

I will take care of this in the next patch once the approach we take
for this feature gets finalized.

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-03-01 17:46:23 Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
Previous Message Nathan Bossart 2022-03-01 17:32:08 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file