Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-06-30 11:05:06
Message-ID: CAA4eK1LnfTnnqLH4Fd0mTAKN2VtBN+sprQRSEE1R+jSsvv++ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 28, 2021 at 10:12 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > Now, if this function is used by super
> > > > users then we can probably trust that they provide the XIDs that we
> > > > can trust to be skipped but OTOH making a restriction to allow these
> > > > functions to be used by superusers might restrict the usage of this
> > > > repair tool.
> > >
> > > If we specify the subscription id or name, maybe we can allow also the
> > > owner of subscription to do that operation?
> >
> > Ah, the owner of the subscription must be superuser.
>
> I've attached PoC patches.
>
> 0001 patch introduces the ability to skip transactions on the
> subscriber side. We can specify XID to the subscription by like ALTER
> SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
> seems straightforward except for setting origin state. After skipping
> the transaction we have to update the session origin state so that we
> can start streaming the transaction next to the one that we just
> skipped in case of the server crash or restarting the apply worker. We
> set origin state to the commit WAL record. However, since we skip all
> changes we don’t write any WAL even if we call CommitTransaction() at
> the end of the skipped transaction. So the patch sets the origin state
> to the transaction that updates the pg_subscription system catalog to
> reset the skip XID. I think we need a discussion of this part.
>

IIUC, for streaming transactions you are allowing stream file to be
created and then remove it at stream_commit/stream_abort time, is that
right? If so, in which cases are you imagining the files to be
created, is it in the case of relation message
(LOGICAL_REP_MSG_RELATION)? Assuming the previous two statements are
correct, this will skip the relation message as well as part of the
removal of stream files which might lead to a problem because the
publisher won't know that we have skipped the relation message and it
won't send it again. This can cause problems while processing the next
messages.

> With 0002 and 0003 patches, we report the error information in server
> logs and the stats view, respectively. 0002 patch adds errcontext for
> messages that happened during applying the changes:
>
> ERROR: duplicate key value violates unique constraint "hoge_pkey"
> DETAIL: Key (c)=(1) already exists.
> CONTEXT: during apply of "INSERT" for relation "public.hoge" in
> transaction with xid 736 committs 2021-06-27 12:12:30.053887+09
>
> 0003 patch adds pg_stat_logical_replication_error statistics view
> discussed on another thread[1]. The apply worker sends the error
> information to the stats collector if an error happens during applying
> changes. We can check those errors as follow:
>
> postgres(1:25250)=# select * from pg_stat_logical_replication_error;
> subname | relid | action | xid | last_failure
> ----------+-------+--------+-----+-------------------------------
> test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
> (1 row)
>
> I added only columns required for the skipping transaction feature to
> the view for now.
>

Isn't it better to add an error message if possible?

> Please note that those patches are meant to evaluate the concept we've
> discussed so far. Those don't have the doc update yet.
>

I think your patch is on the lines of what we have discussed. It would
be good if you can update docs and add few tests.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-06-30 11:14:15 Re: Use simplehash.h instead of dynahash in SMgr
Previous Message Simon Riggs 2021-06-30 09:59:02 PITR: enhance getRecordTimestamp()