Re: [BUG] Logical replica crash if there was an error in a function.

From: "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Maxim Orlov <orlovmg(at)gmail(dot)com>
Subject: Re: [BUG] Logical replica crash if there was an error in a function.
Date: 2022-11-15 01:39:53
Message-ID: 1e20f11a-e285-e766-1df6-f6c420abda1a@inbox.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

On 02.11.2022 21:02, Tom Lane wrote:
> So I'm now good with the idea of just not failing. I don't like
> the patch as presented though. First, the cfbot is quite rightly
> complaining about the "uninitialized variable" warning it draws.
> Second, I don't see a good reason to tie the change to logical
> replication in any way. Let's just change the Assert to an if(),
> as attached.

Fully agreed that is the most optimal solution for that case. Thanks!
Surely it's very rare one but there was a real segfault at production server.
Someone came up with the idea to modify function like public.test_selector()
in repcmd.sql (see above) on the fly with adding to it :last_modification:
field from current time and some other parameters with the help of replace()
inside the creation of the rebuild_test() procedure.

On 03.11.2022 18:29, Tom Lane wrote:
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>> LGTM. I don't know if it is a good idea to omit the test case for this
>> scenario. If required, we can reuse the test case from Sawada-San's
>> patch in the email [1].
>
> I don't think the cost of that test case is justified by the tiny
> probability that it'd ever catch anything. If we were just adding a
> query or two to an existing scenario, that could be okay; but spinning
> up and syncing a whole new primary and standby database is *expensive*
> when you multiply it by the number of times developers and buildfarm
> animals are going to run the tests in the future.
>
> There's also the little issue that I'm not sure it would actually
> detect a problem if we had one. The case is going to fail, and
> what we want to know is just how messily it fails, and I think the
> TAP infrastructure isn't very sensitive to that ... especially
> if the test isn't even checking for specific error messages.
Seems it is possible to do a test without these remarks. The attached
test uses existing nodes and checks the specific error message. Additionally
i've tried to reduce overall number of nodes previously
used in this test in a similar way.

Would be glad for comments and remarks.

With best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v1-0001-Test-for-func-error-in-logrep-worker.patch text/x-patch 4.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-15 01:59:42 Re: [BUG] Logical replica crash if there was an error in a function.
Previous Message Andres Freund 2022-11-15 01:25:31 Re: Assertion failure in SnapBuildInitialSnapshot()