From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoiding data loss with synchronous replication |
Date: | 2021-07-23 10:58:03 |
Message-ID: | CAA4eK1L2p4NLyhidETqOphcZMv14mTqs6NCO2YpAk470zkFfwQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jul 23, 2021 at 2:48 AM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>
> Hi hackers,
>
> As previously discussed [0], canceling synchronous replication waits
> can have the unfortunate side effect of making transactions visible on
> a primary server before they are replicated. A failover at this time
> would cause such transactions to be lost. The proposed solution in
> the previous thread [0] involved blocking such cancellations, but many
> had concerns about that approach (e.g., backends could be
> unresponsive, server restarts were still affected by this problem). I
> would like to propose something more like what Fujii-san suggested [1]
> that would avoid blocking cancellations while still preventing data
> loss. I believe this is a key missing piece of the synchronous
> replication functionality in PostgreSQL.
>
> AFAICT there are a variety of ways that the aforementioned problem may
> occur:
> 1. Server restarts: As noted in the docs [2], "waiting transactions
> will be marked fully committed once the primary database
> recovers." I think there are a few options for handling this,
> but the simplest would be to simply failover anytime the primary
> server shut down. My proposal may offer other ways of helping
> with this.
> 2. Backend crashes: If a backend crashes, the postmaster process
> will restart everything, leading to the same problem described in
> 1. However, this behavior can be prevented with the
> restart_after_crash parameter [3].
> 3. Client disconnections: During waits for synchronous replication,
> interrupt processing is turned off, so disconnected clients
> actually don't seem to cause a problem. The server will still
> wait for synchronous replication to complete prior to making the
> transaction visible on the primary.
> 4. Query cancellations and backend terminations: This appears to be
> the only gap where there is no way to avoid potential data loss,
> and it is the main target of my proposal.
>
> Instead of blocking query cancellations and backend terminations, I
> think we should allow them to proceed, but we should keep the
> transactions marked in-progress so they do not yet become visible to
> sessions on the primary.
>
One naive question, what if the primary gets some error while changing
the status from in-progress to committed? Won't in such a case the
transaction will be visible on standby but not on the primary?
> Once replication has caught up to the
> the necessary point, the transactions can be marked completed, and
> they would finally become visible.
>
If the session issued the commit is terminated, will this work be done
by some background process?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2021-07-23 11:02:43 | Re: How is this possible "publication does not exist" |
Previous Message | Nitin Jadhav | 2021-07-23 10:39:47 | Re: when the startup process doesn't (logging startup delays) |