Re: Avoiding data loss with synchronous replication

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding data loss with synchronous replication
Date: 2021-07-24 10:52:09
Message-ID: 1880F781-8459-4430-870D-1455988DEAB0@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

<html><head></head><body dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="ApplePlainTextBody"><div class="ApplePlainTextBody"><br><br><blockquote type="cite">23 июля 2021 г., в 22:54, Bossart, Nathan &lt;bossartn(at)amazon(dot)com&gt; написал(а):<br><br>On 7/23/21, 4:33 AM, "Andrey Borodin" &lt;x4mmm(at)yandex-team(dot)ru&gt; wrote:<br><blockquote type="cite">Thanks for you interest in the topic. I think in the thread [0] we almost agreed on general design.<br>The only left question is that we want to threat pg_ctl stop and kill SIGTERM differently to pg_terminate_backend().<br></blockquote><br>I didn't get the idea that there was a tremendous amount of support<br>for the approach to block canceling waits for synchronous replication.<br>FWIW this was my initial approach as well, but I've been trying to<br>think of alternatives.<br><br>If we can gather support for some variation of the block-cancels<br>approach, I think that would be preferred over my proposal from a<br>complexity standpoint. &nbsp;<br></blockquote>Let's clearly enumerate problems of blocking.<br>It's been mentioned that backend is not responsive when cancelation is blocked. But on the contrary, it's very responsive.<br><br>postgres=# alter system set synchronous_standby_names to 'bogus';<br>ALTER SYSTEM<br>postgres=# alter system set synchronous_commit_cancelation TO off ;<br>ALTER SYSTEM<br>postgres=# select pg_reload_conf();<br>2021-07-24 15:35:03.054 +05 [10452] LOG: &nbsp;received SIGHUP, reloading configuration files<br> l <br>---<br> t<br>(1 row)<br>postgres=# begin;<br>BEGIN<br>postgres=*# insert into t1 values(0);<br>INSERT 0 1<br>postgres=*# commit ;<br>^CCancel request sent<br>WARNING: &nbsp;canceling wait for synchronous replication requested, but cancelation is not allowed<br>DETAIL: &nbsp;The COMMIT record has already flushed to WAL locally and might not have been replicated to the standby. We must wait here.<br>^CCancel request sent<br>WARNING: &nbsp;canceling wait for synchronous replication requested, but cancelation is not allowed<br>DETAIL: &nbsp;The COMMIT record has already flushed to WAL locally and might not have been replicated to the standby. We must wait here.<br><br>It tells clearly what's wrong. If it's still not enough, let's add hint about synchronous standby names.<br><br>Are there any other problems with blocking cancels?<br><br><br><blockquote type="cite">Robert's idea to provide a way to understand<br>the intent of the cancellation/termination request [0] could improve<br>matters. &nbsp;Perhaps adding an argument to pg_cancel/terminate_backend()<br>and using different signals to indicate that we want to cancel the<br>wait would be something that folks could get on board with.<br></blockquote><br>Semantics of cancelation assumes correct query interruption. This is not possible already when we committed locally. There cannot be any correct cancelation. And I don't think it worth to add incorrect cancelation.<br><br><br>Interestingly, converting transaction to 2PC is a neat idea when the backend is terminated. It provides more guaranties that transaction will commit correctly even after restart. But we may be short of max_prepared_xacts slots...<br>Anyway backend termination bothers me a lot less than cancelation - drivers do not terminate queries on their own. But they cancel queries by default.<br><br><br>Thanks!<br><br>Best regards, Andrey Borodin.</div></body></html>

Attachment Content-Type Size
unknown_filename text/html 3.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2021-07-24 10:53:15 Re: Avoiding data loss with synchronous replication
Previous Message Michael Paquier 2021-07-24 10:41:12 Re: Incorrect usage of strtol, atoi for non-numeric junk inputs