Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-05-25 08:13:37
Message-ID: CAD21AoDpp71FeEgtX9Dfvb8L-uoaPNtfdKd5PwDH3-5SB+1xbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 25, 2021 at 2:49 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Hi all,
> >
> > If a logical replication worker cannot apply the change on the
> > subscriber for some reason (e.g., missing table or violating a
> > constraint, etc.), logical replication stops until the problem is
> > resolved. Ideally, we resolve the problem on the subscriber (e.g., by
> > creating the missing table or removing the conflicting data, etc.) but
> > occasionally a problem cannot be fixed and it may be necessary to skip
> > the entire transaction in question. Currently, we have two ways to
> > skip transactions: advancing the LSN of the replication origin on the
> > subscriber and advancing the LSN of the replication slot on the
> > publisher. But both ways might not be able to skip exactly one
> > transaction in question and end up skipping other transactions too.
>
> Does it mean pg_replication_origin_advance() can't skip exactly one
> txn? I'm not familiar with the function or never used it though, I was
> just searching for "how to skip a single txn in postgres" and ended up
> in [1]. Could you please give some more details on scenarios when we
> can't skip exactly one txn? Is there any other way to advance the LSN,
> something like directly updating the pg_replication_slots catalog?

Sorry, it's not impossible. Although the user mistakenly skips more
than one transaction by specifying a wrong LSN it's always possible to
skip an exact one transaction.

>
> [1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html
>
> > I’d like to propose a way to skip the particular transaction on the
> > subscriber side. As the first step, a transaction can be specified to
> > be skipped by specifying remote XID on the subscriber. This feature
> > would need two sub-features: (1) a sub-feature for users to identify
> > the problem subscription and the problem transaction’s XID, and (2) a
> > sub-feature to skip the particular transaction to apply.
> >
> > For (1), I think the simplest way would be to put the details of the
> > change being applied in errcontext. For example, the following
> > errcontext shows the remote XID as well as the action name, the
> > relation name, and commit timestamp:
> >
> > ERROR: duplicate key value violates unique constraint "test_pkey"
> > DETAIL: Key (c)=(1) already exists.
> > CONTEXT: during apply of "INSERT" for relation "public.test" in
> > transaction with xid 590 commit timestamp 2021-05-21
> > 14:32:02.134273+09
> >
> > The user can identify which remote XID has a problem during applying
> > the change (XID=590 in this case). As another idea, we can have a
> > statistics view for logical replication workers, showing information
> > of the last failure transaction.
>
> Agree with Amit on this. At times, it is difficult to look around in
> the server logs, so it will be better to have it in both places.
>
> > For (2), what I'm thinking is to add a new action to ALTER
> > SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
> > TRANSACTION 590. Also, we can have actions to reset it; ALTER
> > SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
> > XID to a new column of pg_subscription or a new catalog, having the
> > worker reread its subscription information. Once the worker skipped
> > the specified transaction, it resets the transaction to skip on the
> > catalog. The syntax allows users to specify one remote XID to skip. In
> > the future, it might be good if users can also specify multiple XIDs
> > (a range of XIDs or a list of XIDs, etc).
>
> What's it like skipping a txn with txn id? Is it that the particular
> txn is forced to commit or abort or just skipping some of the code in
> the apply worker?

What I'm thinking is to ignore the entire transaction with the
specified XID. IOW Logical replication workers don't even start the
transaction and ignore all changes associated with the XID.

> IIUC, the behavior of RESET SKIP TRANSACTION is just
> to forget the txn id specified in SET SKIP TRANSACTION right?

Right. I proposed this RESET command for users to cancel the skipping behavior.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2021-05-25 08:14:50 Re: Assertion failure while streaming toasted data
Previous Message osumi.takamichi@fujitsu.com 2021-05-25 08:13:27 RE: locking [user] catalog tables vs 2pc vs logical rep