Re: Allow async standbys wait for sync replication

From: "Hsu, John" <hsuchen(at)amazon(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow async standbys wait for sync replication
Date: 2022-03-09 01:08:49
Message-ID: 8fc4fb4c-d429-4c17-b239-13a560bcd966@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 3/5/22 10:57 PM, Bharath Rupireddy wrote:
> On Sun, Mar 6, 2022 at 1:57 AM Andres Freund<andres(at)anarazel(dot)de> wrote:
>> Hi,
>>
>> On 2022-03-05 14:14:54 +0530, Bharath Rupireddy wrote:
>>> I understand. Even if we use the SyncRepWaitForLSN approach, the async
>>> walsenders will have to do nothing in WalSndLoop() until the sync
>>> walsender wakes them up via SyncRepWakeQueue.
>> I still think we should flat out reject this approach. The proper way to
>> implement this feature is to change the protocol so that WAL can be sent to
>> replicas with an additional LSN informing them up to where WAL can be
>> flushed. That way WAL is already sent when the sync replicas have acknowledged
>> receipt and just an updated "flush/apply up to here" LSN has to be sent.
> I was having this thought back of my mind. Please help me understand these:
> 1) How will the async standbys ignore the WAL received but
> not-yet-flushed by them in case the sync standbys don't acknowledge
> flush LSN back to the primary for whatever reasons?
> 2) When we say the async standbys will receive the WAL, will they just
> keep the received WAL in the shared memory but not apply or will they
> just write but not apply the WAL and flush the WAL to the pg_wal
> directory on the disk or will they write to some other temp wal
> directory until they receive go-ahead LSN from the primary?
> 3) Won't the network transfer cost be wasted in case the sync standbys
> don't acknowledge flush LSN back to the primary for whatever reasons?
>
> The proposed idea in this thread (async standbys waiting for flush LSN
> from sync standbys before sending the WAL), although it makes async
> standby slower in receiving the WAL, it doesn't have the above
> problems and is simpler to implement IMO. Since this feature is going
> to be optional with a GUC, users can enable it based on the needs.
>
I think another downside of the approach would be if the async-replica
had a lot of changes that were unacknowledged and it were to be
restarted for whatever reason we might need to recreate the replica, or
run pg_rewind from it again which seems to be what we're trying to avoid.

It also pushes the complexity to the client side for consumers who stream
changes from logical slots which the current proposal seems to prevent.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zheng Li 2022-03-09 01:16:18 Re: Reducing power consumption on idle servers
Previous Message Masahiko Sawada 2022-03-09 00:58:28 Re: Optionally automatically disable logical replication subscriptions on error