Optionally automatically disable logical replication subscriptions on error

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Optionally automatically disable logical replication subscriptions on error
Date: 2021-06-17 20:18:38
Message-ID: DB35438F-9356-4841-89A0-412709EBD3AB@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

Logical replication apply workers for a subscription can easily get stuck in an infinite loop of attempting to apply a change, triggering an error (such as a constraint violation), exiting with an error written to the subscription worker log, and restarting.

As things currently stand, only superusers can create subscriptions. Ongoing work to delegate superuser tasks to non-superusers creates the potential for even more errors to be triggered, specifically, errors where the apply worker does not have permission to make changes to the target table.

The attached patch makes it possible to create a subscription using a new subscription_parameter, "disable_on_error", such that rather than going into an infinite loop, the apply worker will catch errors and automatically disable the subscription, breaking the loop. The new parameter defaults to false. When false, the PG_TRY/PG_CATCH overhead is avoided, so for subscriptions which do not use the feature, there shouldn't be any change. Users can manually clear the error after fixing the underlying issue with an ALTER SUBSCRIPTION .. ENABLE command.

In addition to helping on production systems, this makes writing TAP tests involving error conditions simpler. I originally ran into the motivation to write this patch when frustrated that TAP tests needed to parse the apply worker log file to determine whether permission failures were occurring and what they were. It was also obnoxiously easy to have a test get stuck waiting for a permanently stuck subscription to catch up. This helps with both issues.

I don't think this is quite ready for commit, but I'd like feedback if folks like this idea or want to suggest design changes.

Attachment Content-Type Size
v1-0001-Optionally-disabling-subscriptions-on-error.patch application/octet-stream 27.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-06-17 20:30:31 Re: pgbench logging broken by time logic changes
Previous Message Andres Freund 2021-06-17 20:17:22 Re: Centralizing protective copying of utility statements