Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-02-22 10:02:12
Message-ID: cce24592-8f20-dc91-3aec-f6257d4091ae@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2/22/23 03:28, Jonathan S. Katz wrote:
> Hi,
>
> On 2/16/23 10:50 AM, Tomas Vondra wrote:
>> Hi,
>>
>> Here's a rebased patch, without the last bit which is now unnecessary
>> thanks to c981d9145dea.
>
> Thanks for continuing to work on this patch! I tested the latest version
> and have some feedback/clarifications.
>

Thanks!

> I did some testing using a demo-app-based-on-a-real-world app I had
> conjured up[1]. This uses integer sequences as surrogate keys.
>
> In general things seemed to work, but I had a couple of
> observations/questions.
>
> 1. Sequence IDs after a "failover". I believe this is a design decision,
> but I noticed that after simulating a failover, the IDs were replicating
> from a higher value, e.g.
>
> INSERT INTO room (name) VALUES ('room 1');
> INSERT INTO room (name) VALUES ('room 2');
> INSERT INTO room (name) VALUES ('room 3');
> INSERT INTO room (name) VALUES ('room 4');
>
> The values of room_id_seq on each instance:
>
> instance 1:
>
>  last_value | log_cnt | is_called
> ------------+---------+-----------
>           4 |      29 | t
>
>  instance 2:
>
>   last_value | log_cnt | is_called
> ------------+---------+-----------
>          33 |       0 | t
>
> After the switchover on instance 2:
>
> INSERT INTO room (name) VALUES ('room 5') RETURNING id;
>
>  id
> ----
>  34
>
> I don't see this as an issue for most applications, but we should at
> least document the behavior somewhere.
>

Yes, this is due to how we WAL-log sequences. We don't log individual
increments, but every 32nd increment and we log the "future" sequence
state so that after a crash/recovery we don't generate duplicates.

So you do nextval() and it returns 1. But into WAL we record 32. And
there will be no WAL records until nextval reaches 32 and needs to
generate another batch.

And because logical replication relies on these WAL records, it inherits
this batching behavior with a "jump" on recovery/failover. IMHO it's OK,
it works for the "logical failover" use case and if you need gapless
sequences then regular sequences are not an issue anyway.

It's possible to reduce the jump a bit by reducing the batch size (from
32 to 0) so that every increment is logged. But it doesn't eliminate it
because of rollbacks.

> 2. Using with origin=none with nonconflicting sequences.
>
> I modified the example in [1] to set up two schemas with non-conflicting
> sequences[2], e.g. on instance 1:
>
> CREATE TABLE public.room (
>     id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 1)
> PRIMARY KEY,
>     name text NOT NULL
> );
>
> and instance 2:
>
> CREATE TABLE public.room (
>     id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 2)
> PRIMARY KEY,
>     name text NOT NULL
> );
>

Well, yeah. We don't support active-active logical replication (at least
not with the built-in). You can easily get into similar issues without
sequences.

Replicating a sequence overwrites the state of the sequence on the other
side, which may result in it generating duplicate values with the other
node, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2023-02-22 10:13:03 Re: Refactor calculations to use instr_time
Previous Message John Naylor 2023-02-22 09:55:30 Re: [PoC] Improve dead tuple storage for lazy vacuum