Re: Skipping logical replication transactions on subscriber side

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-09-02 20:44:58
Message-ID: F7A797EB-6E0D-43F9-9AE7-10776DD383D3@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've attached rebased patches.

Here are some review comments:

For the v12-0002 patch:

The documentation changes for ALTER SUBSCRIPTION .. RESET look strange to me. You grouped SET and RESET together, much like sql-altertable.html has them grouped, but I don't think it flows naturally here, as the two commands do not support the same set of parameters. It might look better if you documented these separately. It might also be good to order the parameters the same, so that the differences can more quickly be seen.

For the v12-0003 patch:

I believe this feature is needed, but it also seems like a very powerful foot-gun. Can we do anything to make it less likely that users will hurt themselves with this tool?

I am thinking back to support calls I have attended. When a production system is down, there is often some hesitancy to perform ad-hoc operations on the database, but once the decision has been made to do so, people try to get the whole process done as quickly as possible. If multiple transactions on the publisher fail on the subscriber, they will do so in series, not in parallel. The process of clearing these errors will amount to copying the xid of each failed transaction to the ALTER SUBSCRIPTION ... SET (skip_xid = xxx) command and running it, then the next, then the next, .... Perhaps the first couple times through the process, the customer will look to see that the failure is of the same type and on the same table, but after a short time they will likely just script something to clear the rest as quickly as possible. In the heat of the moment, they may not include a check of the failure message, but merely a grep of the failing xid.

If the user could instead clear all failed transactions of the same type, that might make it less likely that they unthinkingly also skip subsequent errors of some different type. Perhaps something like ALTER SUBSCRIPTION ... SET (skip_failures = 'duplicate key value violates unique constraint "test_pkey"')? This is arguably a different feature request, and not something your patch is required to address, but I wonder how much we should limit people shooting themselves in the foot? If we built something like this using your skip_xid feature, rather than instead of your skip_xid feature, would your feature need to be modified?

The docs could use some rewording, too:

+ If incoming data violates any constraints the logical replication
+ will stop until it is resolved.

In my experience, logical replication doesn't stop, but instead goes into an infinite loop of retries.

+ The resolution can be done either
+ by changing data on the subscriber so that it doesn't conflict with
+ incoming change or by skipping the whole transaction.

I'm having trouble thinking of an example conflict where skipping a transaction would be better than writing a BEFORE INSERT trigger on the conflicting table which suppresses or redirects conflicting rows somewhere else. Particularly for larger transactions containing multiple statements, suppressing the conflicting rows using a trigger would be less messy than skipping the transaction. I think your patch adds a useful tool to the toolkit, but maybe we should mention more alternatives in the docs? Something like, "changing the data on the subscriber so that it doesn't conflict with incoming changes, or dropping the conflicting constraint or unique index, or writing a trigger on the subscriber to suppress or redirect conflicting incoming changes, or as a last resort, by skipping the whole transaction"?

Perhaps I'm reading your phrase "changing the data on the subscriber" too narrowly. To me, that means running DML (either a DELETE or an UPDATE) on the existing data in the table where the conflict arises. These other options are DDL and do not easily come to mind when I read that phrase.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-09-02 20:54:52 Re: [PATCH] Partial foreign key updates in referential integrity triggers
Previous Message Mark Dilger 2021-09-02 19:33:52 Re: Skipping logical replication transactions on subscriber side