Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-11-28 14:41:55
Message-ID: 195be236-7580-72c6-97c7-59f06c4e093c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/28/23 12:32, Amit Kapila wrote:
> On Mon, Nov 27, 2023 at 11:45 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> I spent a bit of time looking at the proposed change, and unfortunately
>> logging just the boolean flag does not work. A good example is this bit
>> from a TAP test added by the patch for built-in replication (which was
>> not included with the WIP patch):
>>
>> BEGIN;
>> ALTER SEQUENCE s RESTART WITH 1000;
>> SAVEPOINT sp1;
>> INSERT INTO seq_test SELECT nextval('s') FROM generate_series(1,100);
>> ROLLBACK TO sp1;
>> COMMIT;
>>
>> This is expected to produce:
>>
>> 1131|0|t
>>
>> but produces
>>
>> 1000|0|f
>>
>> instead. The reason is very simple - as implemented, the patch simply
>> checks if the relfilenode is from the same top-level transaction, which
>> it is, and sets the flag to "true". So we know the sequence changes need
>> to be queued and replayed as part of this transaction.
>>
>> But then during decoding, we still queue the changes into the subxact,
>> which then aborts, and the changes are discarded. That is not how it's
>> supposed to work, because the new relfilenode is still valid, someone
>> might do nextval() and commit. And the nextval() may not get WAL-logged,
>> so we'd lose this.
>>
>> What I guess we might do is log not just a boolean flag, but the XID of
>> the subtransaction that created the relfilenode. And then during
>> decoding we'd queue the changes into this subtransaction ...
>>
>> 0006 in the attached patch series does this, and it seems to fix the TAP
>> test failure. I left it at the end, to make it easier to run tests
>> without the patch applied.
>>
>
> Offhand, I don't have any better idea than what you have suggested for
> the problem but this needs some thoughts including the questions asked
> by you. I'll spend some time on it and respond back.
>

I've been experimenting with the idea to log the XID, and for a moment I
was worried it actually can't work, because subtransactions may not
actually be just nested in simple way, but form a tree. And what if the
sequence was altered in a different branch (sibling subxact), not in the
immediate parent. In which case the new SubTransactionGetXid() would
fail, because it just walks the current chain of subtransactions.

I've been thinking about cases like this:

BEGIN;
CREATE SEQUENCE s; # XID 1000
SELECT alter_sequence(); # XID 1001
SAVEPOINT s1;
SELECT COUNT(nextval('s')) FROM generate_series(1,100); # XID 1000
ROLLBACK TO s1;
SELECT COUNT(nextval('s')) FROM generate_series(1,100); # XID 1000
COMMIT;

The XID values are what the sequence wal record will reference, assuming
that the main transaction XID is 1000.

Initially, I thought it's wrong that the nextval() calls reference XID
of the main transaction, because the last relfilenode comes from 1001,
which is the subxact created by alter_sequence() thanks to the exception
handling block. And that's where the approach in reorderbuffer would
queue the changes.

But I think this is actually correct too. When a subtransaction commits
(e.g. when alter_sequence() completes), it essentially becomes part of
the parent. And AtEOSubXact_cleanup() updates rd_newRelfilelocatorSubid
accordingly, setting it to parentSubid.

This also means that SubTransactionGetXid() can't actually fail, because
the ID has to reference an active subtransaction in the current stack.
I'm still concerned about the cost of the lookup, because the list may
be long and the subxact we're looking for may be quite high, but I guess
we might have another field, caching the XID. It'd need to be updated
only in AtEOSubXact_cleanup, and at that point we know it's the
immediate parent, so it'd be pretty cheap I think.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2023-11-28 14:55:31 Re: [HACKERS] Changing references of password encryption to hashing
Previous Message Stephen Frost 2023-11-28 14:24:31 Re: Partial aggregates pushdown