Re: Optionally automatically disable logical replication subscriptions on error

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optionally automatically disable logical replication subscriptions on error
Date: 2021-06-18 04:47:36
Message-ID: CAA4eK1KkhsNAW3=XxOdVRD9+RcQG5PRRFBFzAMibMG0YFFmAmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 18, 2021 at 1:48 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
> Hackers,
>
> Logical replication apply workers for a subscription can easily get stuck in an infinite loop of attempting to apply a change, triggering an error (such as a constraint violation), exiting with an error written to the subscription worker log, and restarting.
>
> As things currently stand, only superusers can create subscriptions. Ongoing work to delegate superuser tasks to non-superusers creates the potential for even more errors to be triggered, specifically, errors where the apply worker does not have permission to make changes to the target table.
>
> The attached patch makes it possible to create a subscription using a new subscription_parameter, "disable_on_error", such that rather than going into an infinite loop, the apply worker will catch errors and automatically disable the subscription, breaking the loop. The new parameter defaults to false. When false, the PG_TRY/PG_CATCH overhead is avoided, so for subscriptions which do not use the feature, there shouldn't be any change. Users can manually clear the error after fixing the underlying issue with an ALTER SUBSCRIPTION .. ENABLE command.
>

I see this idea has merits and it will help users to repair failing
subscriptions. Few points on a quick look at the patch: (a) The patch
seem to be assuming that the error can happen only by the apply worker
but I think the constraint violation can happen via one of the table
sync workers as well, (b) What happens if the error happens when you
are updating the error information in the catalog table. I think
instead of seeing the actual apply time error, the user might see some
other for which it won't be clear what is an appropriate action.

We are also discussing another action like skipping the apply of the
transaction on an error [1]. I think it is better to evaluate both the
proposals as one seems to be an extension of another. Adding
Sawada-San, as he is working on the other proposal.

[1] - https://www.postgresql.org/message-id/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK%3D30xJfUVihNZDA%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-06-18 04:52:43 Re: snapshot too old issues, first around wraparound and then more.
Previous Message Michael Paquier 2021-06-18 04:07:06 Re: [PoC] Federated Authn/z with OAUTHBEARER