Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-01-11 18:29:49
Message-ID: 5d39d133-91d7-e39a-3746-e9ad6a55b580@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/10/23 20:52, Robert Haas wrote:
> On Tue, Jan 10, 2023 at 1:32 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> 0001 is a fix for the pre-existing issue in logicalmsg_decode,
>> attempting to build a snapshot before getting into a consistent state.
>> AFAICS this only affects assert-enabled builds and is otherwise
>> harmless, because we are not actually using the snapshot (apply gets a
>> valid snapshot from the transaction).
>>
>> This is mostly the fix I shared in November, except that I kept the call
>> in decode.c (per comment from Andres). I haven't added any argument to
>> SnapBuildProcessChange because we may need to backpatch this (and it
>> didn't seem much simpler, IMHO).
>
> I tend to associate transactional behavior with snapshots, so it looks
> odd to see code that builds a snapshot only when the message is
> non-transactional. I think that a more detailed comment spelling out
> the reasoning would be useful here.
>

I'll try adding a comment explaining this, but the reasoning is fairly
simple AFAICS:

1) We don't actually need to build the snapshot for transactional
changes, because if we end up applying the change, we'll use snapshot
provided/maintained by reorderbuffer.

2) But we don't know if we end up applying the change - it may happen
this is one of the transactions we're waiting to finish / skipped, in
which case the snapshot is kinda bogus anyway. What "saved" us is that
we'll not actually use the snapshot in the end. It's just the assert
that causes issues.

3) For non-transactional changes, we need a snapshot because we're going
to execute the callback right away. But in this case the code actually
protects against building inconsistent snapshots.

>> This however brings me to the original question what's the purpose of
>> this patch - and that's essentially keeping sequences up to date to make
>> them usable after a failover. We can't generate values from the sequence
>> on the subscriber, because it'd just get overwritten. And from this
>> point of view, it's also fine that the sequence is slightly ahead,
>> because that's what happens after crash recovery anyway. And we're not
>> guaranteeing the sequences to be gap-less.
>
> I agree that it's fine for the sequence to be slightly ahead, but I
> think that it can't be too far ahead without causing problems. Suppose
> for example that transaction #1 creates a sequence. Transaction #2
> does nextval on the sequence a bunch of times and inserts rows into a
> table using the sequence values as the PK. It's fine if the nextval
> operations are replicated ahead of the commit of transaction #2 -- in
> fact I'd say it's necessary for correctness -- but they can't precede
> the commit of transaction #1, since then the sequence won't exist yet.

It's not clear to me how could that even happen. If transaction #1
creates a sequence, it's invisible for transaction #2. So how could it
do nextval() on it? #2 has to wait for #1 to commit before it can do
anything on the sequence, which enforces the correct ordering, no?

> Likewise, if there's an ALTER SEQUENCE that creates a new relfilenode,
> I think that needs to act as a barrier: non-transactional changes that
> happened before that transaction must also be replicated before that
> transaction is replicated, and those that happened after that
> transaction is replicated must be replayed after that transaction is
> replicated. Otherwise, at the very least, there will be states visible
> on the standby that were never visible on the origin server, and maybe
> we'll just straight up get the wrong answer. For instance:
>
> 1. nextval, setting last_value to 3
> 2. ALTER SEQUENCE, getting a new relfilenode, and also set last_value to 19
> 3. nextval, setting last_value to 20
>
> If 3 happens before 2, the sequence ends up in the wrong state.
>
> Maybe you've already got this and similar cases totally correctly
> handled, I'm not sure, just throwing it out there.
>

I believe this should behave correctly too, thanks to locking.

If a transaction does ALTER SEQUENCE, that locks the sequence, so only
that transaction can do stuff with that sequence (and changes from that
point are treated as transactional). And everyone else is waiting for #1
to commit.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2023-01-11 18:31:31 Re: Remove source code display from \df+?
Previous Message Jacob Champion 2023-01-11 18:27:29 Re: Can we let extensions change their dumped catalog schemas?