Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2024-01-27 19:37:03
Message-ID: 996e4d3c-648a-46d0-86d2-6465f53ddc58@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/26/24 15:39, Robert Haas wrote:
> On Wed, Jan 24, 2024 at 12:46 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> I did try to explain how this works (and why) in a couple places:
>>
>> 1) the commit message
>> 2) reorderbuffer header comment
>> 3) ReorderBufferSequenceIsTransactional comment (and nearby)
>>
>> It's possible this does not meet your expectations, ofc. Maybe there
>> should be a separate README for this - I haven't found anything like
>> that for logical decoding in general, which is why I did (1)-(3).
>
> I read over these and I do think they answer a bunch of questions, but
> I don't think they answer all of the questions.
>
> Suppose T1 creates a sequence and commits. Then T2 calls nextval().
> Then T3 drops the sequence. According to the commit message, T2's
> change will be "replayed immediately after decoding". But it's
> essential to replay T2's change after we replay T1 and before we
> replay T3, and the comments don't explain why that's guaranteed.
>
> The answer might be "locks". If we always replay a transaction
> immediately when we see it's commit record then in the example above
> we're fine, because the commit record for the transaction that creates
> the sequence must precede the nextval() call, since the sequence won't
> be visible until the transaction commits, and also because T1 holds a
> lock on it at that point sufficient to hedge out nextval. And the
> nextval record must precede the point where T3 takes an exclusive lock
> on the sequence.
>

Right, locks + apply in commit order gives us this guarantee (I can't
think of a case where it wouldn't be the case).

> Note, however, that this change of reasoning critically depends on us
> never delaying application of a transaction. If we might reach T1's
> commit record and say "hey, let's hold on to this for a bit and replay
> it after we've decoded some more," everything immediately breaks,
> unless we also delay application of T2's non-transactional update in
> such a way that it's still guaranteed to happen after T1. I wonder if
> this kind of situation would be a problem for a future parallel-apply
> feature. It wouldn't work, for example, to hand T1 and T3 off (in that
> order) to a separate apply process but handle T2's "non-transactional"
> message directly, because it might handle that message before the
> application of T1 got completed.
>

Doesn't the whole logical replication critically depend on the commit
order? If you decide to arbitrarily reorder/delay the transactions, all
kinds of really bad things can happen. That's a generic problem, it
applies to all kinds of objects, not just sequences - a parallel apply
would need to detect this sort of dependencies (e.g. INSERT + DELETE of
the same key), and do something about it.

Similar for sequences, where the important event is allocation of a new
relfilenode.

If anything, it's easier for sequences, because the relfilenode tracking
gives us an explicit (and easy) way to detect these dependencies between
transactions.

> This also seems to depend on every transactional operation that might
> affect a future non-transactional operation holding a lock that would
> conflict with that non-transactional operation. For example, if ALTER
> SEQUENCE .. RESTART WITH didn't take a strong lock on the sequence,
> then you could have: T1 does nextval, T2 does ALTER SEQUENCE RESTART
> WITH, T1 does nextval again, T1 commits, T2 commits. It's unclear what
> the semantics of that would be -- would T1's second nextval() see the
> sequence restart, or what? But if the effect of T1's second nextval
> does depend in some way on the ALTER SEQUENCE operation which precedes
> it in the WAL stream, then we might have some trouble here, because
> both nextvals precede the commit of T2. Fortunately, this sequence of
> events is foreclosed by locking.
>

I don't quite follow :-(

AFAIK this theory hinges on not having the right lock, but I believe
ALTER SEQUENCE does obtain the lock (at least in cases that assign a new
relfilenode). Which means such reordering should not be possible,
because nextval() in other transactions will then wait until commit. And
all nextval() calls in the same transaction will be treated as
transactional.

So I think this works OK. If something does not lock the sequence in a
way that would prevent other xacts to do nextval() on it, it's not a
change that would change the relfilenode - and so it does not switch the
sequence into a transactional mode.

> But I did find one somewhat-similar case in which that's not so.
>
> S1: create table withseq (a bigint generated always as identity);
> S1: begin;
> S2: select nextval('withseq_a_seq');
> S1: alter table withseq set unlogged;
> S2: select nextval('withseq_a_seq');
>
> I think this is a bug in the code that supports owned sequences rather
> than a problem that this patch should have to do something about. When
> a sequence is flipped between logged and unlogged directly, we take a
> stronger lock than we do here when it's done in this indirect way.

Yes, I think this is a bug in handling of owned sequences - from the
moment the "ALTER TABLE ... SET UNLOGGED" is executed, the two sessions
generate duplicate values (until the S1 is committed, at which point the
values generated in S2 get "forgotten").

It seems we end up updating both relfilenodes, which is clearly wrong.

Seems like a bug independent of the decoding, IMO.

> Also, I'm not quite sure if it would pose a problem for sequence
> decoding anyway: it changes the relfilenode, but not the value. But
> this is the *kind* of problem that could make the approach unsafe:
> supposedly transactional changes being interleaved with supposedly
> non-transctional changes, in such a way that the non-transactional
> changes might get applied at the wrong time relative to the
> transactional changes.
>

I'm not sure what you mean by "changes relfilenode, not value" but I
suspect it might break the sequence decoding - or at least confuse it. I
haven't thecked what exactly happens when we change logged/unlogged for
a sequence, but I assume it does change the relfilenode, which already
is a change of a value - we WAL-log the new sequence state, at least.
But it should be treated as "transactional" in the transaction that did
the ALTER TABLE, because it created the relfilenode.

However, I'm not sure this is a valid argument against the sequence
decoding patch. If something does not acquire the correct lock, it's not
surprising something else breaks, if it relies on the lock.

Of course, I understand you're trying to make a broader point - that if
something like this could happen in "correct" case, it'd be a problem.

But I don't think that's possible. The whole "transactional" thing is
determined by having a new relfilenode for the sequence, and I can't
imagine a case where we could assign a new relfilenode without a lock.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-01-27 19:44:13 Re: proposal: psql: show current user in prompt
Previous Message Tom Lane 2024-01-27 19:04:22 PG versus libxml2 2.12.x