Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2022-03-01 15:00:36
Message-ID: CAD21AoCTHQDvq7gWdP4BFpjCYswcFRnHyzAUJn16zSPD4bO4bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 15, 2022 at 7:35 PM Peter Eisentraut
<peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>
> On 14.02.22 10:16, Amit Kapila wrote:
> > I think exposing LSN is a better approach as it doesn't have the
> > dangers of wraparound. And, I think users can use it with the existing
> > function pg_replication_origin_advance() which will save us from
> > adding additional code for this feature. We can explain/expand in docs
> > how users can use the error information from view/error_logs and use
> > the existing function to skip conflicting transactions. We might want
> > to even expose error_origin to make it a bit easier for users but not
> > sure. I feel the need for the new syntax (and then added code
> > complexity due to that) isn't warranted if we expose error_LSN and let
> > users use it with the existing functions.
>
> Well, the whole point of this feature is to provide a higher-level
> interface instead of pg_replication_origin_advance(). Replication
> origins are currently not something the users have to deal with
> directly. We already document that you can use
> pg_replication_origin_advance() to skip erroring transactions. But that
> seems unsatisfactory. It'd be like using pg_surgery to fix unique
> constraint violations.

+1

I’ve considered a plan for the skipping logical replication
transaction feature toward PG15. Several ideas and patches have been
proposed here and another related thread[1][2] for the skipping
logical replication transaction feature as follows:

A. Change pg_stat_subscription_workers (committed 7a8507329085)
B. Add origin name and commit-LSN to logical replication worker
errcontext (proposed[2])
C. Store error information (e.g., the error message and commit-LSN) to
the system catalog
D. Introduce ALTER SUBSCRIPTION SKIP
E. Record the skipped data somewhere: server logs or a table

Given the remaining time for PG15, it’s unlikely to complete all of
them for PG15 by the feature freeze. The most realistic plan for PG15
in my mind is to complete B and D. With these two items, the LSN of
the error-ed transaction is shown in the server log, and we can ask
users to check server logs for the LSN and use it with ALTER
SUBSCRIPTION SKIP command. If the community agrees with B+D, we will
have a user-visible feature for PG15 which can be further
extended/improved in PG16 by adding C and E. I started a new thread[2]
for B yesterday. In this thread, I'd like to discuss D.

I've attached an updated patch for D and here is the summary:

* Introduce a new command ALTER SUBSCRIPTION ... SKIP (lsn =
'0/1234'). The user can get the commit-LSN of the transaction in
question from the server logs thanks to B[2].
* The user-specified LSN (say skip-LSN) is stored in the
pg_subscription catalog.
* The apply worker skips the whole transaction if the transaction's
commit-LSN exactly matches to skip-LSN.
* The skip-LSN has an effect on only the first non-empty transaction
since the worker started to apply changes. IOW it's cleared after
either skipping the whole transaction or successfully committing a
non-empty transaction, preventing the skip-LSN to remain in the
catalog. Also, since the latter case means that the user set the wrong
skip-LSN we clear it with a warning.
* ALTER SUBSCRIPTION SKIP doesn't support tablesync workers. But it
would not be a problem in practice since an error during table
synchronization is not common and could be resolved by truncating the
table and restarting the synchronization.

For the above reasons, ALTER SUBSCRIPTION SKIP command is safer than
the existing way of using pg_replication_origin_advance().

I've attached an updated patch along with two patches for cfbot tests
since the main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another
thread[2].

Regards,

[1] https://www.postgresql.org/message-id/20220125063131.4cmvsxbz2tdg6g65%40alap3.anarazel.de
[2] https://www.postgresql.org/message-id/CAD21AoBarBf2oTF71ig2g_o%3D3Z_Dt6_sOpMQma1kFgbnA5OZ_w%40mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch application/octet-stream 59.7 KB
v12-0001-Use-complete-sentences-in-logical-replication-wo.patch application/octet-stream 3.1 KB
v12-0002-Add-the-origin-name-and-remote-commit-LSN-to-log.patch application/octet-stream 11.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-03-01 15:07:04 Re: Pre-allocating WAL files
Previous Message David Steele 2022-03-01 14:44:51 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file