Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Date: 2022-08-04 08:31:40
Message-ID: CALj2ACUiyE1ui4Daqw1cw-tLTSqkiB1bYZ6rWb5BE70-C65Cog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 4, 2022 at 1:42 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Jul 25, 2022 at 4:20 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> >
> > > 25 июля 2022 г., в 14:29, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> написал(а):
> > >
> > > Hm, after thinking for a while, I tend to agree with the above
> > > approach - meaning, query cancel interrupt processing can completely
> > > be disabled in SyncRepWaitForLSN() and process proc die interrupt
> > > immediately, this approach requires no GUC as opposed to the proposed
> > > v1 patch upthread.
> > GUC was proposed here[0] to maintain compatibility with previous behaviour. But I think that having no GUC here is fine too. If we do not allow cancelation of unreplicated backends, of course.
> >
> > >>
> > >> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence of a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline end is successfully replicated to synchronous_standby_names.
> > >
> > > Hm, that needs to be done anyways. How about doing as proposed
> > > initially upthread [1]? Also, quoting the idea here [2].
> > >
> > > Thoughts?
> > >
> > > [1] https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com
> > > [2] 2) Wait for sync standbys to catch up upon restart after the crash or
> > > in the next txn after the old locally committed txn was canceled. One
> > > way to achieve this is to let the backend, that's making the first
> > > connection, wait for sync standbys to catch up in ClientAuthentication
> > > right after successful authentication. However, I'm not sure this is
> > > the best way to do it at this point.
> >
> >
> > I think ideally startup process should not allow read only connections in CheckRecoveryConsistency() until WAL is not replicated to quorum al least up until new timeline LSN.
>
> We can't do it in CheckRecoveryConsistency() unless I'm missing
> something. Because, the walsenders (required for sending the remaining
> WAL to sync standbys to achieve quorum) can only be started after the
> server reaches a consistent state, after all walsenders are
> specialized backends.

Continuing on the above thought (I inadvertently clicked the send
button previously): A simple approach would be to check for quorum in
PostgresMain() before entering the query loop for (;;) for
non-walsender cases. A disadvantage of this would be that all the
backends will be waiting here in the worst case if it takes time for
achieving the sync quorum after restart - roughly we can do the
following in PostgresMain(), of course we need locking mechanism so
that all the backends whoever reaches here will wait for the same lsn:

if (sync_replicaion_defined == true &&
shmem->wait_for_sync_repl_upon_restart == true)
{
SyncRepWaitForLSN(pg_current_wal_flush_lsn(), false);
shmem->wait_for_sync_repl_upon_restart = false;
}

Thoughts?

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-08-04 08:38:00 Re: Fix obsoleted comments for function prototypes
Previous Message Kyotaro Horiguchi 2022-08-04 08:23:48 Re: collate not support Unicode Variation Selector