Re: Disallow cancellation of waiting for synchronous replication

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Marco Slot <marco(at)citusdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, dim(at)tapoueh(dot)org, milyutinma(at)gmail(dot)com
Subject: Re: Disallow cancellation of waiting for synchronous replication
Date: 2021-04-23 10:19:49
Message-ID: 54AA38F5-4C96-4D1A-8C9A-BC41B211AF8D@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Aleksander!

Thanks for looking into this.

> 23 апр. 2021 г., в 14:30, Aleksander Alekseev <aleksander(at)timescale(dot)com> написал(а):
>
> Hi hackers,
>
>>>> After using a patch for a while it became obvious that PANICing during termination is not a good idea. Even when we wait for synchronous replication. It generates undesired coredumps.
>>>> I think in presence of SIGTERM it's reasonable to say that we cannot protect user anymore.
>>>> PFA v3.
>
> This patch, although solving a concrete and important problem, looks
> more like a quick workaround than an appropriate solution. Or is it
> just me?
>
> Ideally, the transaction should be committed only after getting a
> reply from the standby.
Getting reply from the standby is a part of a commit. Commit is completed only when WAL reached standby. Commit, certainly, was initiated before getting reply from standby. We cannot commit only after we commit.

> If the user cancels the transaction, it
> doesn't get committed anywhere.
The problem is user tries to cancel a transaction after they asked for commit. We never promised rolling back committed transaction.
When user asks for commit we insert commit record into WAL. And then wait when it is acknowledged by quorum of standbys and local storage.
We cannot discard this record on standbys. Or, at one point we will have to discard discard records. Or discard discard discard records.

> This is what people into distributed
> systems would expect unless stated otherwise, at least.
I think, our transaction semantics is stated clearly in documentation.

> Although I
> realize how complicated it is to implement, especially considering all
> the possible corner cases (netsplit right after getting a reply, etc).
> Maybe we could come up with a less than ideal, but still sound and
> easy-to-understand model, which, as soon as you learned it, doesn't
> bring unexpected surprises to the user.
The model proposed by my patch sounds as follows:
transaction effects should not be observable on primary until requirements of synchronous_commit are satisfied.

E.g. even if user issues cancel of committed locally transaction, we should not release locks held by this transaction.
What unexpected surprises do you see in this model?

Thanks for reviewing!

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-04-23 10:21:10 How to test Postgres for any unaligned memory accesses?
Previous Message Fujii Masao 2021-04-23 10:11:53 Re: INCLUDING COMPRESSION