Re: Disallow cancellation of waiting for synchronous replication

From: Maksim Milyutin <milyutinma(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disallow cancellation of waiting for synchronous replication
Date: 2019-12-25 09:34:22
Message-ID: f3ffc220-e601-cc43-3784-f9bba66dc382@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.12.2019 00:19, Tom Lane wrote:

>> Three is still a problem when backend is not canceled, but terminated [2].
> Exactly. If you don't have a fix that handles that case, you don't have
> anything. In fact, you've arguably made things worse, by increasing the
> temptation to terminate or "kill -9" the nonresponsive session.

I assume that the termination of backend that causes termination of
PostgreSQL instance in Andrey's patch proposal have to be resolved by
external HA agents that could interrupt such terminations as parent
process of postmaster and make appropriate decisions e.g., restart
PostgreSQL node in closed from external users state (via pg_hba.conf
manipulation) until all sync replicas synchronize changes from master.
Stolon HA tool implements this strategy  [1]. This logic (waiting for
all replicas declared in synchronous_standby_names replicate all WAL
from master) could be implemented inside PostgreSQL kernel after start
recovery process before database is opened to users and this can be done
separately later.

Another approach is to implement two-phase commit over master and sync
replicas (as it did Oracle in old versions [2]) where the risk to get
local committed data under instance restarting and query canceling is
minimal (after starting of final commitment phase). But this approach
has latency penalty and complexity to resolve partial (prepared but not
committed) transactions under coordinator (in this case master node)
failure in automatic mode. Nicely if this approach will be implemented
later as option of synchronous commit.

1.
https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

2.
https://docs.oracle.com/cd/B28359_01/server.111/b28326/repmaster.htm#i33607

--
Best regards,
Maksim Milyutin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Maksim Milyutin 2019-12-25 10:28:39 Re: Disallow cancellation of waiting for synchronous replication
Previous Message Amit Langote 2019-12-25 08:47:38 Re: table partition and column default