Re: logical decoding and replication of sequences, take 2

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-11-27 11:11:07
Message-ID: CAA4eK1+fEHLn2sYS+6eZV-S8boejmo1sAUYWL4G5KeFbvmy5kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 27, 2023 at 4:17 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 11/27/23 11:13, Amit Kapila wrote:
> > On Mon, Nov 27, 2023 at 11:34 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>
> >> On Mon, Nov 27, 2023 at 6:41 AM Tomas Vondra
> >> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>>
> >>> While going over 0001, I realized there might be an optimization for
> >>> ReorderBufferSequenceIsTransactional. As coded in 0001, it always
> >>> searches through all top-level transactions, and if there's many of them
> >>> that might be expensive, even if very few of them have any relfilenodes
> >>> in the hash table. It's still linear search, and it needs to happen for
> >>> each sequence change.
> >>>
> >>> But can the relfilenode even be in some other top-level transaction? How
> >>> could it be - our transaction would not see it, and wouldn't be able to
> >>> generate the sequence change. So we should be able to simply check *our*
> >>> transaction (or if it's a subxact, the top-level transaction). Either
> >>> it's there (and it's transactional change), or not (and then it's
> >>> non-transactional change).
> >>>
> >>
> >> I also think the relfilenode should be part of either the current
> >> top-level xact or one of its subxact, so looking at all the top-level
> >> transactions for each change doesn't seem advisable.
> >>
> >>> The 0004 does this.
> >>>
> >>> This of course hinges on when exactly the transactions get created, and
> >>> assignments processed. For example if this would fire before the txn
> >>> gets assigned to the top-level one, this would break. I don't think this
> >>> can happen thanks to the immediate logging of assignments, but I'm too
> >>> tired to think about it now.
> >>>
> >>
> >> This needs some thought because I think we can't guarantee the
> >> association till we reach the point where we can actually decode the
> >> xact. See comments in AssertTXNLsnOrder() [1].
> >>
>
> I suppose you mean the comment before the SnapBuildXactNeedsSkip call,
> which says:
>
> /*
> * Skip the verification if we don't reach the LSN at which we start
> * decoding the contents of transactions yet because until we reach
> * the LSN, we could have transactions that don't have the association
> * between the top-level transaction and subtransaction yet and
> * consequently have the same LSN. We don't guarantee this
> * association until we try to decode the actual contents of
> * transaction. The ordering of the records prior to the
> * start_decoding_at LSN should have been checked before the restart.
> */
>
> But doesn't this say that after we actually start decoding / stop
> skipping, we should have seen the assignment? We're already decoding
> transaction contents (because sequence change *is* part of xact, even if
> we decide to replay it in the non-transactional way).
>

It means to say that the assignment is decided after start_decoding_at
point. We haven't decided that we are past start_decoding_at by the
time the patch is computing the transactional flag.

> >
> > I am wondering that instead of building the infrastructure to know
> > whether a particular change is transactional on the decoding side,
> > can't we have some flag in the WAL record to note whether the change
> > is transactional or not? I have discussed this point with my colleague
> > Kuroda-San and we thought that it may be worth exploring whether we
> > can use rd_createSubid/rd_newRelfilelocatorSubid in RelationData to
> > determine if the sequence is created/changed in the current
> > subtransaction and then record that in WAL record. By this, we need to
> > have additional information in the WAL record like XLOG_SEQ_LOG but we
> > can probably do it only with wal_level as logical.
> >
>
> I may not understand the proposal exactly, but it's not enough to know
> if it was created in the same subxact. It might have been created in
> some earlier subxact in the same top-level xact.
>

We should be able to detect even some earlier subxact or top-level
xact based on rd_createSubid/rd_newRelfilelocatorSubid.

> FWIW I think one of the earlier patch versions did something like this,
> by adding a "created" flag in the xlog record. And we concluded doing
> this on the decoding side is a better solution.
>

oh, I thought it would be much simpler than what we are doing on the
decoding-side. Can you please point me to the email discussion where
this is concluded or share the reason?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-11-27 11:20:03 Re: Simplify xlogreader.c with XLogRec* macros
Previous Message Bharath Rupireddy 2023-11-27 10:59:18 Re: Do away with a few backwards compatibility macros