Quick Links

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

From:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Subject:	Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Date:	2022-04-26 06:26:59
Message-ID:	9290b55b6ae2b04e002ca9dadadd1cca09461482.camel@cybertec.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote:
> With synchronous replication typically all the transactions (txns)
> first locally get committed, then streamed to the sync standbys and
> the backend that generated the transaction will wait for ack from sync
> standbys. While waiting for ack, it may happen that the query or the
> txn gets canceled (QueryCancelPending is true) or the waiting backend
> is asked to exit (ProcDiePending is true). In either of these cases,
> the wait for ack gets canceled and leaves the txn in an inconsistent
> state [...]
>
> Here's a proposal (mentioned previously by Satya [1]) to avoid the
> above problems:
> 1) Wait a configurable amount of time before canceling the sync
> replication by the backends i.e. delay processing of
> QueryCancelPending and ProcDiePending in Introduced a new timeout GUC
> synchronous_replication_naptime_before_cancel, when set, it will let
> the backends wait for the ack before canceling the synchronous
> replication so that the transaction can be available in sync standbys
> as well.
> 2) Wait for sync standbys to catch up upon restart after the crash or
> in the next txn after the old locally committed txn was canceled.

While this may mitigate the problem, I don't think it will deal with
all the cases which could cause a transaction to end up committed locally,
but not on the synchronous standby. I think that only using the full
power of two-phase commit can make this bulletproof.

Is it worth adding additional complexity that is not a complete solution?

Yours,
Laurenz Albe

In response to

An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication at 2022-04-25 14:21:03 from Bharath Rupireddy

Responses

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication at 2022-05-09 09:20:21 from Bharath Rupireddy
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication at 2022-08-05 02:49:16 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zhao Rui	2022-04-26 07:16:13	Fix primary crash continually with invalid checkpoint after promote
Previous Message	Thomas Munro	2022-04-26 06:11:20	Re: WIP: WAL prefetch (another approach)