Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-11-27 13:41:40
Message-ID: 8d0d4b34-3e12-2637-e8b8-f4c3b2b0a1f3@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/27/23 12:11, Amit Kapila wrote:
> On Mon, Nov 27, 2023 at 4:17 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 11/27/23 11:13, Amit Kapila wrote:
>>> On Mon, Nov 27, 2023 at 11:34 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>>
>>>> On Mon, Nov 27, 2023 at 6:41 AM Tomas Vondra
>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>>
>>>>> While going over 0001, I realized there might be an optimization for
>>>>> ReorderBufferSequenceIsTransactional. As coded in 0001, it always
>>>>> searches through all top-level transactions, and if there's many of them
>>>>> that might be expensive, even if very few of them have any relfilenodes
>>>>> in the hash table. It's still linear search, and it needs to happen for
>>>>> each sequence change.
>>>>>
>>>>> But can the relfilenode even be in some other top-level transaction? How
>>>>> could it be - our transaction would not see it, and wouldn't be able to
>>>>> generate the sequence change. So we should be able to simply check *our*
>>>>> transaction (or if it's a subxact, the top-level transaction). Either
>>>>> it's there (and it's transactional change), or not (and then it's
>>>>> non-transactional change).
>>>>>
>>>>
>>>> I also think the relfilenode should be part of either the current
>>>> top-level xact or one of its subxact, so looking at all the top-level
>>>> transactions for each change doesn't seem advisable.
>>>>
>>>>> The 0004 does this.
>>>>>
>>>>> This of course hinges on when exactly the transactions get created, and
>>>>> assignments processed. For example if this would fire before the txn
>>>>> gets assigned to the top-level one, this would break. I don't think this
>>>>> can happen thanks to the immediate logging of assignments, but I'm too
>>>>> tired to think about it now.
>>>>>
>>>>
>>>> This needs some thought because I think we can't guarantee the
>>>> association till we reach the point where we can actually decode the
>>>> xact. See comments in AssertTXNLsnOrder() [1].
>>>>
>>
>> I suppose you mean the comment before the SnapBuildXactNeedsSkip call,
>> which says:
>>
>> /*
>> * Skip the verification if we don't reach the LSN at which we start
>> * decoding the contents of transactions yet because until we reach
>> * the LSN, we could have transactions that don't have the association
>> * between the top-level transaction and subtransaction yet and
>> * consequently have the same LSN. We don't guarantee this
>> * association until we try to decode the actual contents of
>> * transaction. The ordering of the records prior to the
>> * start_decoding_at LSN should have been checked before the restart.
>> */
>>
>> But doesn't this say that after we actually start decoding / stop
>> skipping, we should have seen the assignment? We're already decoding
>> transaction contents (because sequence change *is* part of xact, even if
>> we decide to replay it in the non-transactional way).
>>
>
> It means to say that the assignment is decided after start_decoding_at
> point. We haven't decided that we are past start_decoding_at by the
> time the patch is computing the transactional flag.
>

Ah, I see. We're deciding if the change is transactional before calling
SnapBuildXactNeedsSkip. That's a bit unfortunate.

>>>
>>> I am wondering that instead of building the infrastructure to know
>>> whether a particular change is transactional on the decoding side,
>>> can't we have some flag in the WAL record to note whether the change
>>> is transactional or not? I have discussed this point with my colleague
>>> Kuroda-San and we thought that it may be worth exploring whether we
>>> can use rd_createSubid/rd_newRelfilelocatorSubid in RelationData to
>>> determine if the sequence is created/changed in the current
>>> subtransaction and then record that in WAL record. By this, we need to
>>> have additional information in the WAL record like XLOG_SEQ_LOG but we
>>> can probably do it only with wal_level as logical.
>>>
>>
>> I may not understand the proposal exactly, but it's not enough to know
>> if it was created in the same subxact. It might have been created in
>> some earlier subxact in the same top-level xact.
>>
>
> We should be able to detect even some earlier subxact or top-level
> xact based on rd_createSubid/rd_newRelfilelocatorSubid.
>

Interesting. I admit I haven't considered using these fields before, so
I need to familiarize with it a bit, and try if it'd work.

>> FWIW I think one of the earlier patch versions did something like this,
>> by adding a "created" flag in the xlog record. And we concluded doing
>> this on the decoding side is a better solution.
>>
>
> oh, I thought it would be much simpler than what we are doing on the
> decoding-side. Can you please point me to the email discussion where
> this is concluded or share the reason?
>

I think the discussion started around [1], and then in a bunch of
following messages (search for "relfilenode").

regards

[1]
https://www.postgresql.org/message-id/CAExHW5v_vVqkhF4ehST9EzpX1L3bemD1S%2BkTk_-ZVu_ir-nKDw%40mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-11-27 13:54:56 Re: logical decoding and replication of sequences, take 2
Previous Message Daniel Gustafsson 2023-11-27 13:40:09 Re: Testing autovacuum wraparound (including failsafe)