Re: Disallow cancellation of waiting for synchronous replication

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Marco Slot <marco(at)citusdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, dim(at)tapoueh(dot)org, milyutinma(at)gmail(dot)com
Subject: Re: Disallow cancellation of waiting for synchronous replication
Date: 2021-03-11 14:15:46
Message-ID: 4f8d54c9-6f18-23d5-c4de-9d6656d3a408@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/12/09 18:07, Andrey Borodin wrote:
>
>
>> 9 июня 2020 г., в 23:32, Jeff Davis <pgsql(at)j-davis(dot)com> написал(а):
>>
>>
>
> After using a patch for a while it became obvious that PANICing during termination is not a good idea. Even when we wait for synchronous replication. It generates undesired coredumps.
> I think in presence of SIGTERM it's reasonable to say that we cannot protect user anymore.
>
> PFA v3.

I don't think that preventing a backend from being canceled during waiting for
sync rep actually addresses your issue. As mentioned upthread, there are
other cases that can cause the issue, for example, restart of the server while
backends are waiting for sync rep.

As far as I understand your idea, what we should do is to make new transaction
wait until WAL has been replicated to the standby up to the latest WAL record
committed locally before starting? We don't need to prevent the cancellation
during sync rep wait.

If we do that, new transaction cannot see any changes by another transaction
that was canceled during sync rep, until all the committed WAL records are
replicated. Doesn't this address your issue? I think that this idea works in
not only cancellation case but also other cases.

If we want to control this new wait in application level, we can implement
something like pg_wait_for_syncrep(pg_lsn) function. This function waits
until WAL is replicated to the standby up to the specified lsn. For example,
we can execute pg_wait_for_syncrep(pg_current_wal_lsn()) in the application
whenever we need that consistent point.

Other idea is to add new GUC. If this GUC is enabled, transaction waits for
all the committed records to be replicated whenever it takes new snapshot
(probably transaction needs to wait not only when starting but also taking
new snapshot). This prevents the transaction from seeing any data that
have not been replicated yet.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2021-03-11 14:32:16 Re: Self-join optimisation
Previous Message Surafel Temesgen 2021-03-11 14:14:50 Re: WIP: System Versioned Temporal Table