Re: logical decoding and replication of sequences, take 2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2024-01-26 14:39:16
Message-ID: CA+TgmoaPTZaFTrDmvmH69Wq40RXor4VxCL99s1irfdKkMZsYjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 24, 2024 at 12:46 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> I did try to explain how this works (and why) in a couple places:
>
> 1) the commit message
> 2) reorderbuffer header comment
> 3) ReorderBufferSequenceIsTransactional comment (and nearby)
>
> It's possible this does not meet your expectations, ofc. Maybe there
> should be a separate README for this - I haven't found anything like
> that for logical decoding in general, which is why I did (1)-(3).

I read over these and I do think they answer a bunch of questions, but
I don't think they answer all of the questions.

Suppose T1 creates a sequence and commits. Then T2 calls nextval().
Then T3 drops the sequence. According to the commit message, T2's
change will be "replayed immediately after decoding". But it's
essential to replay T2's change after we replay T1 and before we
replay T3, and the comments don't explain why that's guaranteed.

The answer might be "locks". If we always replay a transaction
immediately when we see it's commit record then in the example above
we're fine, because the commit record for the transaction that creates
the sequence must precede the nextval() call, since the sequence won't
be visible until the transaction commits, and also because T1 holds a
lock on it at that point sufficient to hedge out nextval. And the
nextval record must precede the point where T3 takes an exclusive lock
on the sequence.

Note, however, that this change of reasoning critically depends on us
never delaying application of a transaction. If we might reach T1's
commit record and say "hey, let's hold on to this for a bit and replay
it after we've decoded some more," everything immediately breaks,
unless we also delay application of T2's non-transactional update in
such a way that it's still guaranteed to happen after T1. I wonder if
this kind of situation would be a problem for a future parallel-apply
feature. It wouldn't work, for example, to hand T1 and T3 off (in that
order) to a separate apply process but handle T2's "non-transactional"
message directly, because it might handle that message before the
application of T1 got completed.

This also seems to depend on every transactional operation that might
affect a future non-transactional operation holding a lock that would
conflict with that non-transactional operation. For example, if ALTER
SEQUENCE .. RESTART WITH didn't take a strong lock on the sequence,
then you could have: T1 does nextval, T2 does ALTER SEQUENCE RESTART
WITH, T1 does nextval again, T1 commits, T2 commits. It's unclear what
the semantics of that would be -- would T1's second nextval() see the
sequence restart, or what? But if the effect of T1's second nextval
does depend in some way on the ALTER SEQUENCE operation which precedes
it in the WAL stream, then we might have some trouble here, because
both nextvals precede the commit of T2. Fortunately, this sequence of
events is foreclosed by locking.

But I did find one somewhat-similar case in which that's not so.

S1: create table withseq (a bigint generated always as identity);
S1: begin;
S2: select nextval('withseq_a_seq');
S1: alter table withseq set unlogged;
S2: select nextval('withseq_a_seq');

I think this is a bug in the code that supports owned sequences rather
than a problem that this patch should have to do something about. When
a sequence is flipped between logged and unlogged directly, we take a
stronger lock than we do here when it's done in this indirect way.
Also, I'm not quite sure if it would pose a problem for sequence
decoding anyway: it changes the relfilenode, but not the value. But
this is the *kind* of problem that could make the approach unsafe:
supposedly transactional changes being interleaved with supposedly
non-transctional changes, in such a way that the non-transactional
changes might get applied at the wrong time relative to the
transactional changes.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maiquel Grassi 2024-01-26 14:41:46 RE: Current Connection Information
Previous Message vignesh C 2024-01-26 14:37:42 Re: POC: GROUP BY optimization