Re: Skipping logical replication transactions on subscriber side

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-05-25 10:21:09
Message-ID: CALj2ACU5oGYmt4KUzyW5VoFddu2NWj+xTyswtq1L4bSnYNw27w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 25, 2021 at 1:44 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, May 25, 2021 at 2:49 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Hi all,
> > >
> > > If a logical replication worker cannot apply the change on the
> > > subscriber for some reason (e.g., missing table or violating a
> > > constraint, etc.), logical replication stops until the problem is
> > > resolved. Ideally, we resolve the problem on the subscriber (e.g., by
> > > creating the missing table or removing the conflicting data, etc.) but
> > > occasionally a problem cannot be fixed and it may be necessary to skip
> > > the entire transaction in question. Currently, we have two ways to
> > > skip transactions: advancing the LSN of the replication origin on the
> > > subscriber and advancing the LSN of the replication slot on the
> > > publisher. But both ways might not be able to skip exactly one
> > > transaction in question and end up skipping other transactions too.
> >
> > Does it mean pg_replication_origin_advance() can't skip exactly one
> > txn? I'm not familiar with the function or never used it though, I was
> > just searching for "how to skip a single txn in postgres" and ended up
> > in [1]. Could you please give some more details on scenarios when we
> > can't skip exactly one txn? Is there any other way to advance the LSN,
> > something like directly updating the pg_replication_slots catalog?
>
> Sorry, it's not impossible. Although the user mistakenly skips more
> than one transaction by specifying a wrong LSN it's always possible to
> skip an exact one transaction.

IIUC, if the user specifies the "correct LSN", then it's possible to
skip exact txn for which the sync workers are unable to apply changes,
right?

How can the user get the LSN (which we call "correct LSN")? Is it from
pg_replication_slots? Or some other way?

If the user somehow can get the "correct LSN", can't the exact txn be
skipped using it with any of the existing ways, either using
pg_replication_origin_advance or any other ways?

If there's no way to get the "correct LSN", then why can't we just
print that LSN in the error context and/or in the new statistics view
for logical replication workers, so that any of the existing ways can
be used to skip exactly one txn?

IIUC, the feature proposed here guards against the users specifying
wrong LSN. If I'm right, what is the guarantee that users don't
specify the wrong txn id? Why can't we tell the users when a wrong LSN
is specified that "currently, an apply worker is failing to apply the
LSN XXXX, and you specified LSN YYYY, are you sure this is
intentional?"

Please correct me if I'm missing anything.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2021-05-25 10:28:40 How can the Aggregation move to the outer query
Previous Message Dilip Kumar 2021-05-25 10:11:15 Re: Assertion failure while streaming toasted data