Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2022-01-12 00:19:07
Message-ID: CAD21AoDHMLiktd=x9eeEN3-kXpP8Dbz_CHOz1PYy8Mmxw8edZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 11, 2022 at 3:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > I was thinking what if we don't advance origin explicitly in this
> > > > > case? Actually, that will be no different than the transactions where
> > > > > the apply worker doesn't apply any change because the initial sync is
> > > > > in progress (see should_apply_changes_for_rel()) or we have received
> > > > > an empty transaction. In those cases also, the origin lsn won't be
> > > > > advanced even though we acknowledge the advanced last_received
> > > > > location because of keep_alive messages. Now, it is possible after the
> > > > > restart we send the old start_lsn location because the replication
> > > > > origin was not updated before restart but we handle that case in the
> > > > > server by starting from the last confirmed location. See below code:
> > > > >
> > > > > CreateDecodingContext()
> > > > > {
> > > > > ..
> > > > > else if (start_lsn < slot->data.confirmed_flush)
> > > > > ..
> > > >
> > > > Good point. Probably one minor thing that is different from the
> > > > transaction where the apply worker applied an empty transaction is a
> > > > case where the server restarts/crashes before sending an
> > > > acknowledgment of the flush location. That is, in the case of the
> > > > empty transaction, the publisher sends an empty transaction again. On
> > > > the other hand in the case of skipping the transaction, a non-empty
> > > > transaction will be sent again but skip_xid is already changed or
> > > > cleared, therefore the user will have to specify skip_xid again. If we
> > > > write replication origin WAL record to advance the origin lsn, it
> > > > reduces the possibility of that. But I think it’s a very minor case so
> > > > we won’t need to deal with that.
> > > >
> > >
> > > Yeah, in the worst case, it will lead to conflict again and the user
> > > needs to set the xid again.
> >
> > On second thought, the same is true for other cases, for example,
> > preparing the transaction and clearing skip_xid while handling a
> > prepare message. That is, currently we don't clear skip_xid while
> > handling a prepare message but do that while handling commit/rollback
> > prepared message, in order to avoid the worst case. If we do both
> > while handling a prepare message and the server crashes between them,
> > it ends up that skip_xid is cleared and the transaction will be
> > resent, which is identical to the worst-case above.
> >
>
> How are you thinking to update the skip xid before prepare? If we do
> it in the same transaction then the changes in the catalog will be
> part of the prepared xact but won't be committed. Now, say if we do it
> after prepare, then the situation won't be the same because after
> restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN. If the server crashes between
them, the skip_xid is cleared but the transaction will be resent.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-01-12 00:19:27 Re: Skipping logical replication transactions on subscriber side
Previous Message Tom Lane 2022-01-11 23:27:21 Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings