Re: logical decoding and replication of sequences, take 2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-01-10 19:52:20
Message-ID: CA+TgmoaVLiKDD5vr1bzL-rxhMA37KCS_2xrqjbKVwGyqK+PCXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 10, 2023 at 1:32 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> 0001 is a fix for the pre-existing issue in logicalmsg_decode,
> attempting to build a snapshot before getting into a consistent state.
> AFAICS this only affects assert-enabled builds and is otherwise
> harmless, because we are not actually using the snapshot (apply gets a
> valid snapshot from the transaction).
>
> This is mostly the fix I shared in November, except that I kept the call
> in decode.c (per comment from Andres). I haven't added any argument to
> SnapBuildProcessChange because we may need to backpatch this (and it
> didn't seem much simpler, IMHO).

I tend to associate transactional behavior with snapshots, so it looks
odd to see code that builds a snapshot only when the message is
non-transactional. I think that a more detailed comment spelling out
the reasoning would be useful here.

> This however brings me to the original question what's the purpose of
> this patch - and that's essentially keeping sequences up to date to make
> them usable after a failover. We can't generate values from the sequence
> on the subscriber, because it'd just get overwritten. And from this
> point of view, it's also fine that the sequence is slightly ahead,
> because that's what happens after crash recovery anyway. And we're not
> guaranteeing the sequences to be gap-less.

I agree that it's fine for the sequence to be slightly ahead, but I
think that it can't be too far ahead without causing problems. Suppose
for example that transaction #1 creates a sequence. Transaction #2
does nextval on the sequence a bunch of times and inserts rows into a
table using the sequence values as the PK. It's fine if the nextval
operations are replicated ahead of the commit of transaction #2 -- in
fact I'd say it's necessary for correctness -- but they can't precede
the commit of transaction #1, since then the sequence won't exist yet.
Likewise, if there's an ALTER SEQUENCE that creates a new relfilenode,
I think that needs to act as a barrier: non-transactional changes that
happened before that transaction must also be replicated before that
transaction is replicated, and those that happened after that
transaction is replicated must be replayed after that transaction is
replicated. Otherwise, at the very least, there will be states visible
on the standby that were never visible on the origin server, and maybe
we'll just straight up get the wrong answer. For instance:

1. nextval, setting last_value to 3
2. ALTER SEQUENCE, getting a new relfilenode, and also set last_value to 19
3. nextval, setting last_value to 20

If 3 happens before 2, the sequence ends up in the wrong state.

Maybe you've already got this and similar cases totally correctly
handled, I'm not sure, just throwing it out there.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-10 20:08:27 Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Previous Message Peter Geoghegan 2023-01-10 19:47:41 Re: Fixing a couple of buglets in how VACUUM sets visibility map bits