Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-12-07 11:36:10
Message-ID: CAD21AoBNPPEySdB8M7mG9KSg5+F9nfbo8LrWU7d=5uvQ18Uumw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Dec 3, 2021 at 12:12 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 3, 2021 at 11:53 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > > But this syntax gives you flexibility, so we can also
> > > > start with a simple implementation.
> > > >
> > >
> > > Yeah, I also think so. BTW, what do you think of providing extra
> > > flexibility of giving other options like 'operation', 'rel' along with
> > > xid? I think such options could be useful for large transactions that
> > > operate on multiple tables as it is quite possible that only a
> > > particular operation from the entire transaction is the cause of
> > > failure. Now, on one side, we can argue that skipping the entire
> > > transaction is better from the consistency point of view but I think
> > > it is already possible that we just skip a particular update/delete
> > > (if the corresponding tuple doesn't exist on the subscriber). For the
> > > sake of simplicity, we can just allow providing xid at this stage and
> > > then extend it later as required but I am not very sure of that point.
> >
> > +1
> >
> > Skipping a whole transaction by specifying xid would be a good start.
> >
>
> Okay, that sounds reasonable, so let's do that for now.

I'll submit the patch tomorrow.

While updating the patch, I realized that skipping a transaction that
is prepared on the publisher will be tricky a bit;

First of all, since skip-xid is in pg_subscription catalog, we need to
do a catalog update in a transaction and commit it to disable it. I
think we need to set origin-lsn and timestamp of the transaction being
skipped to the transaction that does the catalog update. That is,
during skipping the (not prepared) transaction, we skip all
data-modification changes coming from the publisher, do a catalog
update, and commit the transaction. If we do the catalog update in the
next transaction after skipping the whole transaction, skip_xid could
be left in case of a server crash between them. Also, we cannot set
origin-lsn and timestamp to an empty transaction.

In prepared transaction cases, I think that when handling a prepare
message, we need to commit the transaction to update the catalog,
instead of preparing it. And at the commit prepared and rollback
prepared time, we skip it since there is not the prepared transaction
on the subscriber. Currently, handling rollback prepared already
behaves so; it first checks whether we have prepared the transaction
or not and skip it if haven’t. So I think we need to do that also for
commit prepared case. With that, this requires protocol changes so
that the subscriber can get prepare-lsn and prepare-time when handling
commit prepared.

So I’m writing a separate patch to add prepare-lsn and timestamp to
commit_prepared message, which will be a building block for skipping
prepared transactions. Actually, I think it’s beneficial even today;
we can skip preparing the transaction if it’s an empty transaction.
Although the comment it’s not a common case, I think that it could
happen quite often in some cases:

* XXX, We can optimize such that at commit prepared time, we first check
* whether we have prepared the transaction or not but that doesn't seem
* worthwhile because such cases shouldn't be common.
*/

For example, if the publisher has multiple subscriptions and there are
many prepared transactions that modify the particular table subscribed
by one publisher, many empty transactions are replicated to other
subscribers.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-12-07 12:23:00 Re: Data is copied twice when specifying both child and parent table in publication
Previous Message Greg Nancarrow 2021-12-07 10:46:36 Re: Alter all tables in schema owner fix