Re: Optionally automatically disable logical replication subscriptions on error

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optionally automatically disable logical replication subscriptions on error
Date: 2021-06-21 02:17:05
Message-ID: CAD21AoB+QH4or-QDiakALZp6pkXiAaGxfGcsvNYJoDHk9kknZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 19, 2021 at 11:44 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
> > On Jun 19, 2021, at 3:17 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > Right, but there are things that could be common from the design
> > perspective.
>
> I went to reconcile my patch with that from [1] only to discover there is no patch on that thread. Is there one in progress that I can see?

I will submit the patch.

>
> I don't mind trying to reconcile this patch with what you're discussing in [1], but I am a bit skeptical about [1] becoming a reality and I don't want to entirely hitch this patch to that effort. This can be committed with or without any solution to the idea in [1]. The original motivation for this patch was that the TAP tests don't have a great way to deal with a subscription getting into a fail-retry infinite loop, which makes it harder for me to make progress on [2]. That doesn't absolve me of the responsibility of making this patch a good one, but it does motivate me to keep it simple.

There was a discussion that the skipping transaction patch would also
need to have a feature that tells users the details of the last
failure transaction such as its XID, timestamp, action etc. In that
sense, those two patches might need the common infrastructure that the
apply workers leave the error details somewhere so that the users can
see it.

> > For example, why is it mandatory to update this conflict
> > ( error) information in the system catalog instead of displaying it
> > via some stats view?
>
> The catalog must be updated to disable the subscription, so placing the error information in the same row doesn't require any independent touching of the catalogs. Likewise, the catalog must be updated to re-enable the subscription, so clearing the error from that same row doesn't require any independent touching of the catalogs.
>
> The error information does not *need* to be included in the catalog, but placing the information in any location that won't survive server restart leaves the user no information about why the subscription got disabled after a restart (or crash + restart) happens.
>
> Furthermore, since v2 removed the "disabled_by_error" field in favor of just using subenabled + suberrmsg to determine if the subscription was automatically disabled, not having the information in the catalog would make it ambiguous whether the subscription was manually or automatically disabled.

Is it really useful to write only error message to the system catalog?
Even if we see the error message like "duplicate key value violates
unique constraint “test_tab_pkey”” on the system catalog, we will end
up needing to check the server log for details to properly resolve the
conflict. If the user wants to know whether the subscription is
disabled manually or automatically, the error message on the system
catalog might not necessarily be necessary.

> For the problem in [1], having the xid is more important than it is in my patch, because the user is expected in [1] to use the xid as a handle. But that seems like an odd interface to me. Imagine that a transaction on the publisher side inserted a batch of data, and only a subset of that data conflicts on the subscriber side. What advantage is there in skipping the entire transaction? Wouldn't the user rather skip just the problematic rows? I understand that on the subscriber side it is difficult to do so, but if you are going to implement this sort of thing, it makes more sense to allow the user to filter out data that is problematic rather than filtering out xids that are problematic, and the filter shouldn't just be an in-or-out filter, but rather a mapping function that can redirect the data someplace else or rewrite it before inserting or change the pre-existing conflicting data prior to applying the problematic data or whatever. That's a huge effort, of course, but if the idea in [1] goes in that direction, I don't want my patch to have already added an xid field which ultimately nobody wants.
>

The feature discussed in that thread is meant to be a repair tool for
the subscription in emergency cases when something that should not
have happened happened. I guess that resolving row (or column) level
conflict should be done in another way, for example, by defining
policies for each type of conflict.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-06-21 02:26:23 Re: Optionally automatically disable logical replication subscriptions on error
Previous Message Thomas Munro 2021-06-21 01:23:34 Re: seawasp failing, maybe in glibc allocator