Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers
Date: 2022-01-06 07:59:32
Message-ID: CAHg+QDc-3bPT52yqB-_J-_8bc6DzpxVVzvQMA99XV_0tqOR9wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Consider a cluster formation where we have a Primary(P), Sync Replica(S1),
and multiple async replicas for disaster recovery and read scaling (within
the region and outside the region). In this setup, S1 is the preferred
failover target in an event of the primary failure. When a transaction is
committed on the primary, it is not acknowledged to the client until the
primary gets an acknowledgment from the sync standby that the WAL is
flushed to the disk (assume synchrnous_commit configuration is
remote_flush). However, walsenders corresponds to the async replica on the
primary don't wait for the flush acknowledgment from the primary and send
the WAL to the async standbys (also any logical replication/decoding
clients). So it is possible for the async replicas and logical client ahead
of the sync replica. If a failover is initiated in such a scenario, to
bring the formation into a healthy state we have to either

1. run the pg_rewind on the async replicas for them to reconnect with
the new primary or
2. collect the latest WAL across the replicas and feed the standby.

Both these operations are involved, error prone, and can cause multiple
minutes of downtime if done manually. In addition, there is a window where
the async replicas can show the data that was neither acknowledged to the
client nor committed on standby. Logical clients if they are ahead may need
to reseed the data as no easy rewind option for them.

I would like to propose a GUC send_Wal_after_quorum_committed which when
set to ON, walsenders corresponds to async standbys and logical replication
workers wait until the LSN is quorum committed on the primary before
sending it to the standby. This not only simplifies the post failover steps
but avoids unnecessary downtime for the async replicas. Thoughts?

Thanks,
Satya

On Sun, Dec 5, 2021 at 8:35 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> Hi,
>
> It looks like the logical replication subscribers are receiving the
> quorum uncommitted transactions even before the synchronous (sync)
> standbys. Most of the times it is okay, but it can be a problem if the
> primary goes down/crashes (while the primary is in SyncRepWaitForLSN)
> before the quorum commit is achieved (i.e. before the sync standbys
> receive the committed txns from the primary) and the failover is to
> happen on to the sync standby. The subscriber would have received the
> quorum uncommitted txns whereas the sync standbys didn't. After the
> failover, the new primary (the old sync standby) would be behind the
> subscriber i.e. the subscriber will be seeing the data that the new
> primary can't. Is there a way the subscriber can get back to be in
> sync with the new primary? In other words, can we reverse the effects
> of the quorum uncommitted txns on the subscriber? Naive way is to do
> it manually, but it doesn't seem to be elegant.
>
> We have performed a small experiment to observe the above behaviour
> with 1 primary, 1 sync standby and 1 subscriber:
> 1) Have a wait loop in SyncRepWaitForLSN (a temporary hack to
> illustrate the standby receiving the txn a bit late or fail to
> receive)
> 2) Insert data into a table on the primary
> 3) The primary waits i.e. the insert query hangs (because of the wait
> loop hack ()) before the local txn is sent to the sync standby,
> whereas the subscriber receives the inserted data.
> 4) If the primary crashes/goes down and unable to come up, if the
> failover happens to sync standby (which didn't receive the data that
> got inserted on tbe primary), the subscriber would see the data that
> the sync standby can't.
>
> This looks to be a problem. A possible solution is to let the
> subscribers receive the txns only after the primary achieves quorum
> commit (gets out of the SyncRepWaitForLSN or after all sync standbys
> received the txns). The logical replication walsenders can wait until
> the quorum commit is obtained and then can send the WAL. A new GUC can
> be introduced to control this, default being the current behaviour.
>
> Thoughts?
>
> Thanks Satya (cc-ed) for the use-case and off-list discussion.
>
> Regards,
> Bharath Rupireddy.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-01-06 08:13:27 Re: Make relfile tombstone files conditional on WAL level
Previous Message Pavel Stehule 2022-01-06 07:52:23 Re: pl/pgsql feature request: shorthand for argument and local variable references