Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-12-08 05:15:48
Message-ID: CAA4eK1LasocmhFeBb0D4ixf_J=pDr1OYdyTnEvbhcYToqA=GMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 7, 2021 at 5:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I'll submit the patch tomorrow.
>
> While updating the patch, I realized that skipping a transaction that
> is prepared on the publisher will be tricky a bit;
>
> First of all, since skip-xid is in pg_subscription catalog, we need to
> do a catalog update in a transaction and commit it to disable it. I
> think we need to set origin-lsn and timestamp of the transaction being
> skipped to the transaction that does the catalog update. That is,
> during skipping the (not prepared) transaction, we skip all
> data-modification changes coming from the publisher, do a catalog
> update, and commit the transaction. If we do the catalog update in the
> next transaction after skipping the whole transaction, skip_xid could
> be left in case of a server crash between them.
>

But if we haven't updated origin_lsn/timestamp before the crash, won't
it request the same transaction again from the publisher? If so, it
will be again able to skip it because skip_xid is still not updated.

> Also, we cannot set
> origin-lsn and timestamp to an empty transaction.
>

But won't we update the catalog for skip_xid in that case?

Do we see any advantage of updating the skip_xid in the same
transaction vs. doing it in a separate transaction? If not then
probably we can choose either of those ways and add some comments to
indicate the possibility of doing it another way.

> In prepared transaction cases, I think that when handling a prepare
> message, we need to commit the transaction to update the catalog,
> instead of preparing it. And at the commit prepared and rollback
> prepared time, we skip it since there is not the prepared transaction
> on the subscriber.
>

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

> Currently, handling rollback prepared already
> behaves so; it first checks whether we have prepared the transaction
> or not and skip it if haven’t. So I think we need to do that also for
> commit prepared case. With that, this requires protocol changes so
> that the subscriber can get prepare-lsn and prepare-time when handling
> commit prepared.
>
> So I’m writing a separate patch to add prepare-lsn and timestamp to
> commit_prepared message, which will be a building block for skipping
> prepared transactions. Actually, I think it’s beneficial even today;
> we can skip preparing the transaction if it’s an empty transaction.
> Although the comment it’s not a common case, I think that it could
> happen quite often in some cases:
>
> * XXX, We can optimize such that at commit prepared time, we first check
> * whether we have prepared the transaction or not but that doesn't seem
> * worthwhile because such cases shouldn't be common.
> */
>
> For example, if the publisher has multiple subscriptions and there are
> many prepared transactions that modify the particular table subscribed
> by one publisher, many empty transactions are replicated to other
> subscribers.
>

I think this is not clear to me. Why would one have multiple
subscriptions for the same publication? I thought it is possible when
say some publisher doesn't publish any data of prepared transaction
say because the corresponding action is not published or something
like that. I don't deny that someday we want to optimize this case but
it might be better if we don't need to do it along with this patch.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-12-08 05:17:57 Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display
Previous Message vignesh C 2021-12-08 04:49:25 Re: [PATCH]Comment improvement in publication.sql