Re: Avoiding data loss with synchronous replication

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding data loss with synchronous replication
Date: 2021-07-23 11:32:08
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Nathan!

Thanks for you interest in the topic. I think in the thread [0] we almost agreed on general design.
The only left question is that we want to threat pg_ctl stop and kill SIGTERM differently to pg_terminate_backend().

> 23 июля 2021 г., в 02:17, Bossart, Nathan <bossartn(at)amazon(dot)com> написал(а):
> Hi hackers,
> As previously discussed [0], canceling synchronous replication waits
> can have the unfortunate side effect of making transactions visible on
> a primary server before they are replicated. A failover at this time
> would cause such transactions to be lost. The proposed solution in
> the previous thread [0] involved blocking such cancellations, but many
> had concerns about that approach (e.g., backends could be
> unresponsive, server restarts were still affected by this problem). I
> would like to propose something more like what Fujii-san suggested [1]
> that would avoid blocking cancellations while still preventing data
> loss. I believe this is a key missing piece of the synchronous
> replication functionality in PostgreSQL.
> AFAICT there are a variety of ways that the aforementioned problem may
> occur:
> 1. Server restarts: As noted in the docs [2], "waiting transactions
> will be marked fully committed once the primary database
> recovers." I think there are a few options for handling this,
> but the simplest would be to simply failover anytime the primary
> server shut down. My proposal may offer other ways of helping
> with this.
I think simple check that no other primary exists would suffice.
Currently this is totally concern of HA-tool.

> 2. Backend crashes: If a backend crashes, the postmaster process
> will restart everything, leading to the same problem described in
> 1. However, this behavior can be prevented with the
> restart_after_crash parameter [3].
> 3. Client disconnections: During waits for synchronous replication,
> interrupt processing is turned off, so disconnected clients
> actually don't seem to cause a problem. The server will still
> wait for synchronous replication to complete prior to making the
> transaction visible on the primary.

> 4. Query cancellations and backend terminations: This appears to be
> the only gap where there is no way to avoid potential data loss,
> and it is the main target of my proposal.
> Instead of blocking query cancellations and backend terminations, I
> think we should allow them to proceed, but we should keep the
> transactions marked in-progress so they do not yet become visible to
> sessions on the primary. Once replication has caught up to the
> the necessary point, the transactions can be marked completed, and
> they would finally become visible.
> The main advantages of this approach are 1) it still allows for
> canceling waits for synchronous replication
You can cancel synchronous replication by
ALTER SYSTEM SET synchnorou_standby_names to 'new quorum';
SELECT pg_reload_conf();

All backends waiting for sync rep will proceed with new quorum.

> and 2) it provides an
> opportunity to view and manage waits for synchronous replication
> outside of the standard cancellation/termination functionality. The
> tooling for 2 could even allow a session to begin waiting for
> synchronous replication again if it "inadvertently interrupted a
> replication wait..." [4]. I think the main disadvantage of this
> approach is that transactions committed by a session may not be
> immediately visible to the session when the command returns after
> canceling the wait for synchronous replication. Instead, the
> transactions would become visible in the future once the change is
> replicated. This may cause problems for an application if it doesn't
> handle this scenario carefully.
> What are folks' opinions on this idea? Is this something that is
> worth prototyping?

In fact you propose converting transaction to 2PC if we get CANCEL during sync rep wait.
Transferring locks and other stuff somewhere, acquiring new VXid to our backend, sending CommandComplete while it's not in fact complete etc.
I think it's kind of overly complex for provided reasons.

The ultimate reason of synchronous replication is to make a client wait when it's necessary to wait. If the client wish to execute more commands they can open new connection or set synchronous_commit to desired level in first place. Canceling committed locally transaction will not be possible anyway.


Best regards, Andrey Borodin.


In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Ronan Dunklau 2021-07-23 11:46:42 Re: allow partial union-all and improve parallel subquery costing
Previous Message Laurenz Albe 2021-07-23 11:22:42 Re: Avoiding data loss with synchronous replication