Re: Disallow cancellation of waiting for synchronous replication

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Marco Slot <marco(at)citusdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, dim(at)tapoueh(dot)org, milyutinma(at)gmail(dot)com
Subject: Re: Disallow cancellation of waiting for synchronous replication
Date: 2021-03-11 16:28:26
Message-ID: D09F0F0D-D045-4739-910F-4462AD0E0758@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for looking into this!

> 11 марта 2021 г., в 19:15, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> написал(а):
>
>
>
> On 2020/12/09 18:07, Andrey Borodin wrote:
>>> 9 июня 2020 г., в 23:32, Jeff Davis <pgsql(at)j-davis(dot)com> написал(а):
>>>
>>>
>> After using a patch for a while it became obvious that PANICing during termination is not a good idea. Even when we wait for synchronous replication. It generates undesired coredumps.
>> I think in presence of SIGTERM it's reasonable to say that we cannot protect user anymore.
>> PFA v3.
>
> I don't think that preventing a backend from being canceled during waiting for
> sync rep actually addresses your issue. As mentioned upthread, there are
> other cases that can cause the issue, for example, restart of the server while
> backends are waiting for sync rep.
Well, the patch fully address _my_ issue :) My issue is breaking guarantees of synchronous replication by sending a "cancel" message. Which is send by most drivers automatically.

The patch does not need to address the issue of server restart - it's the job of the HA tool to prevent the start of a database service in a case when a new primary was elected.

The only case patch does not handle is sudden backend crash - Postgres will recover without a restart. I think it is a very small problem compared to "cancel". One needs not only failover but also SIGSEGV in the backend to encounter this problem. Anyway we can address this issue by adding one more GUC preventing PostmasterStateMachine() to invoke crash recovery when (FatalError && pmState == PM_NO_CHILDREN).

> As far as I understand your idea, what we should do is to make new transaction
> wait until WAL has been replicated to the standby up to the latest WAL record
> committed locally before starting? We don't need to prevent the cancellation
> during sync rep wait.
> If we do that, new transaction cannot see any changes by another transaction
> that was canceled during sync rep, until all the committed WAL records are
> replicated. Doesn't this address your issue?
Preventing any new transaction from starting during sync replication wait is not really an option. It would double the latency cost of synchronous replication for writing transactions (wait for RTT on start, wait for RTT on commit). And incur the same cost on reading transactions (which did not need it before).

> I think that this idea works in
> not only cancellation case but also other cases.
>
> If we want to control this new wait in application level, we can implement
> something like pg_wait_for_syncrep(pg_lsn) function. This function waits
> until WAL is replicated to the standby up to the specified lsn. For example,
> we can execute pg_wait_for_syncrep(pg_current_wal_lsn()) in the application
> whenever we need that consistent point.
We want this for every transaction running with synchronous_commit > local. We should not ask users to run one more "make transaction durable" statement. The "COMMIT" is this statement.

> Other idea is to add new GUC. If this GUC is enabled, transaction waits for
> all the committed records to be replicated whenever it takes new snapshot
> (probably transaction needs to wait not only when starting but also taking
> new snapshot). This prevents the transaction from seeing any data that
> have not been replicated yet.
If we block new snapshots after local commit until successful replication we, in fact, linearize reads from standbys. The cost will be immense. The whole idea of MVCC is that writers do not block readers.

Thanks for the ideas!

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-03-11 16:31:32 Re: New IndexAM API controlling index vacuum strategies
Previous Message Alvaro Herrera 2021-03-11 16:26:51 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY