Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-12-13 13:24:38
Message-ID: CAD21AoBdCD=PDq+7buxE3NKC3yNUGuVqqXn9EeqeD6+ozUmQ9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 13, 2021 at 1:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Dec 13, 2021 at 8:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sat, Dec 11, 2021 at 3:29 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > 3.
> > > + * Also, we don't skip receiving the changes in streaming cases,
> > > since we decide
> > > + * whether or not to skip applying the changes when starting to apply changes.
> > >
> > > But why so? Can't we even skip streaming (and writing to file all such
> > > messages)? If we can do this then we can avoid even collecting all
> > > messages in a file.
> >
> > IIUC in streaming cases, a transaction can be sent to the subscriber
> > while splitting into multiple chunks of changes. In the meanwhile,
> > skip_xid can be changed. If the user changed or cleared skip_xid after
> > the subscriber skips some streamed changes, the subscriber won't able
> > to have complete changes of the transaction.
> >
>
> Yeah, I think if we want we can handle this by writing into the stream
> xid file whether the changes need to be skipped and then the
> consecutive streams can check that in the file or may be in some way
> don't allow skip_xid to be changed in worker if it is already skipping
> some xact. If we don't want to do anything for this then it is better
> to at least reflect this reasoning in the comments.

Yes. Given that we still need to apply messages other than
data-modification messages, we need to skip writing only these changes
to the stream file.

>
> > >
> > > 4.
> > > + * Also, one might think that we can skip preparing the skipped transaction.
> > > + * But if we do that, PREPARE WAL record won’t be sent to its physical
> > > + * standbys, resulting in that users won’t be able to find the prepared
> > > + * transaction entry after a fail-over.
> > > + *
> > > ..
> > > + */
> > > + if (skipping_changes)
> > > + stop_skipping_changes(false);
> > >
> > > Why do we need such a Prepare's entry either at current subscriber or
> > > on its physical standby? I think it is to allow Commit-prepared. If
> > > so, how about if we skip even commit prepared as well? Even on
> > > physical standby, we would be having the value of skip_xid which can
> > > help us to skip there as well after failover.
> >
> > It's true that skip_xid would be set also on physical standby. When it
> > comes to preparing the skipped transaction on the current subscriber,
> > if we want to skip commit-prepared I think we need protocol changes in
> > order for subscribers to know prepare_lsn and preppare_timestampso
> > that it can lookup the prepared transaction when doing
> > commit-prepared. I proposed this idea before. This change would be
> > benefical as of now since the publisher sends even empty transactions.
> > But considering the proposed patch[1] that makes the puslisher not
> > send empty transaction, this protocol change would be an optimization
> > only for this feature.
> >
>
> I was thinking to compare the xid received as part of the
> commit_prepared message with the value of skip_xid to skip the
> commit_prepared but I guess the user would change it between prepare
> and commit prepare and then we won't be able to detect it, right? I
> think we can handle this and the streaming case if we disallow users
> to change the value of skip_xid when we are already skipping changes
> or don't let the new skip_xid to reflect in the apply worker if we are
> already skipping some other transaction. What do you think?

In streaming cases, we don’t know when stream-commit or stream-abort
comes and another conflict could occur on the subscription in the
meanwhile. But given that (we expect) this feature is used after the
apply worker enters into an error loop, this is unlikely to happen in
practice unless the user sets the wrong XID. Similarly, in 2PC cases,
we don’t know when commit-prepared or rollback-prepared comes and
another conflict could occur in the meanwhile. But this could occur in
practice even if the user specified the correct XID. Therefore, if we
disallow to change skip_xid until the subscriber receives
commit-prepared or rollback-prepared, we cannot skip the second
transaction that conflicts with data on the subscriber.

From the application perspective, which behavior is preferable between
skipping preparing a transaction and preparing an empty transaction,
in the first place? From the resource consumption etc., skipping
preparing transactions seems better. On the other hand, if we skipped
preparing the transaction, the application would not be able to find
the prepared transaction after a fail-over to the subscriber.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dag Lem 2021-12-13 13:38:22 Re: daitch_mokotoff module
Previous Message Gunnar "Nick" Bluth 2021-12-13 13:21:11 Re: [PATCH] pg_stat_toast