Re: Optionally automatically disable logical replication subscriptions on error

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optionally automatically disable logical replication subscriptions on error
Date: 2021-06-18 19:36:28
Message-ID: B522A0FA-0B5C-4A66-9B73-928CAA86FB88@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jun 17, 2021, at 9:47 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> (a) The patch
> seem to be assuming that the error can happen only by the apply worker
> but I think the constraint violation can happen via one of the table
> sync workers as well

You are right. Peter mentioned the same thing, and it is clearly so. I am working to repair this fault in v2 of the patch.

> (b) What happens if the error happens when you
> are updating the error information in the catalog table.

I think that is an entirely different kind of error. The patch attempts to catch errors caused by the user, not by core functionality of the system failing. If there is a fault that prevents the catalogs from being updated, it is unclear what the patch can do about that.

> I think
> instead of seeing the actual apply time error, the user might see some
> other for which it won't be clear what is an appropriate action.

Good point.

Before trying to do much of anything with the caught error, the v2 patch logs the error. If the subsequent efforts to disable the subscription fail, at least the logs should contain the initial failure message. The v1 patch emitted a log message much further down, and really just intended for debugging the patch itself, with many opportunities for something else to throw before the log is written.

> We are also discussing another action like skipping the apply of the
> transaction on an error [1]. I think it is better to evaluate both the
> proposals as one seems to be an extension of another.

Thanks for the link.

I think they are two separate options. For some users and data patterns, subscriber-side skipping of specific problematic commits will be fine. For other usage patterns, skipping earlier commits will results in more and more data integrity problems (foreign key references, etc.) such that the failures will snowball with skipping becoming the norm rather than the exception. Users with those usage patterns would likely prefer the subscription to automatically be disabled until manual intervention can clean up the problem.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-06-18 19:43:24 Re: PoC: Using Count-Min Sketch for join cardinality estimation
Previous Message Alexey Kondratov 2021-06-18 19:06:53 Re: Supply restore_command to pg_rewind via CLI argument