Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

From: "Hsu, John" <hsuchen(at)amazon(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Date: 2022-03-01 02:04:59
Message-ID: e87ddfa6-18a2-4093-737d-e031b94b1a7e@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The async walsender looks at flush LSN from
> walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; after it comes up and decides to
> send the WAL up to it. If there are no sync replicats after it comes
> up (users can make sync standbys async without postmaster restart
> because synchronous_standby_names is effective with SIGHUP), then it
> doesn't wait at all and continues to send WAL. I don't see any problem
> with it. Am I missing something here? Assuming I understand the code correctly, we have: > SendRqstPtr =
GetFlushRecPtr(NULL); In this contrived example let's say
walsndctl->lsn[SYNC_REP_WAIT_FLUSH] is always 60s behind
GetFlushRecPtr() and for whatever reason, if the walsender hasn't
replicated anything in 30s it'll terminate and re-connect. If
GetFlushRecPtr() keeps advancing and is always 60s ahead of the sync
LSN's then we would never stream anything, even though it's advanced
past what is safe to stream previously.
> I will correct it. "async standby WAL sender with request LSN %X/%X is > waiting as sync standbys are ahead with flush LSN %X/%X", >
LSN_FORMAT_ARGS(sendRqstP), LSN_FORMAT_ARGS(flushLSN). I will think >
more about having better wording of these messages, any suggestions > here?
"async standby WAL sender with request LSN %X/%X is waiting for sync
standbys at LSN %X/%X to advance past it" Not sure if that's really
clearer...

> I too observed this once or twice. It looks like the walsender isn't
> detecting postmaster death in for (;;) with WalSndWait. Not sure if >
this is expected or true with other wait-loops in walsender code. Any >
more thoughts here? Unfortunately I haven't had a chance to dig into it
more although iirc I hit it fairly often. Thanks, John H

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2022-03-01 02:19:12 RE: Optionally automatically disable logical replication subscriptions on error
Previous Message osumi.takamichi@fujitsu.com 2022-03-01 02:04:10 RE: Failed transaction statistics to measure the logical replication progress