Re: logical decoding and replication of sequences, take 2

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-02-22 02:28:55
Message-ID: eb893ca5-97a9-806d-6298-e7d2ddceeede@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2/16/23 10:50 AM, Tomas Vondra wrote:
> Hi,
>
> Here's a rebased patch, without the last bit which is now unnecessary
> thanks to c981d9145dea.

Thanks for continuing to work on this patch! I tested the latest version
and have some feedback/clarifications.

I did some testing using a demo-app-based-on-a-real-world app I had
conjured up[1]. This uses integer sequences as surrogate keys.

In general things seemed to work, but I had a couple of
observations/questions.

1. Sequence IDs after a "failover". I believe this is a design decision,
but I noticed that after simulating a failover, the IDs were replicating
from a higher value, e.g.

INSERT INTO room (name) VALUES ('room 1');
INSERT INTO room (name) VALUES ('room 2');
INSERT INTO room (name) VALUES ('room 3');
INSERT INTO room (name) VALUES ('room 4');

The values of room_id_seq on each instance:

instance 1:

last_value | log_cnt | is_called
------------+---------+-----------
4 | 29 | t

instance 2:

last_value | log_cnt | is_called
------------+---------+-----------
33 | 0 | t

After the switchover on instance 2:

INSERT INTO room (name) VALUES ('room 5') RETURNING id;

id
----
34

I don't see this as an issue for most applications, but we should at
least document the behavior somewhere.

2. Using with origin=none with nonconflicting sequences.

I modified the example in [1] to set up two schemas with non-conflicting
sequences[2], e.g. on instance 1:

CREATE TABLE public.room (
id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 1)
PRIMARY KEY,
name text NOT NULL
);

and instance 2:

CREATE TABLE public.room (
id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 2)
PRIMARY KEY,
name text NOT NULL
);

I ran the following on instance 1:

INSERT INTO public.room ('name') VALUES ('room 1-e');

This committed and successfully replicated.

However, when I ran the following on instance 2, I received a conlifct
error:

INSERT INTO public.room ('name') VALUES ('room 1-w');

The conflict came further down the trigger change, i.e. to a change in
the `public.calendar` table:

2023-02-22 01:49:12.293 UTC [87235] ERROR: duplicate key value violates
unique constraint "calendar_pkey"
2023-02-22 01:49:12.293 UTC [87235] DETAIL: Key (id)=(661) already exists.

After futzing with the logging and restarting, I was also able to
reproduce a similar conflict with the same insert pattern into 'room'.

I did notice that the sequence values kept bouncing around between the
servers. Without any activity, this is what "SELECT * FROM room_id_seq"
would return with queries run ~4s apart:

last_value | log_cnt | is_called
------------+---------+-----------
131 | 0 | t

last_value | log_cnt | is_called
------------+---------+-----------
65 | 0 | t

The values were more varying on "calendar". Again, this is under no
additional write activity, these numbers kept fluctuating:

last_value | log_cnt | is_called
------------+---------+-----------
197 | 0 | t

last_value | log_cnt | is_called
------------+---------+-----------
461 | 0 | t

last_value | log_cnt | is_called
------------+---------+-----------
263 | 0 | t

last_value | log_cnt | is_called
------------+---------+-----------
527 | 0 | t

To handle this case for now, I adapted the schema to create sequences
that we clearly independently named[3]. I did learn that I had to create
sequences on both instances to support this behavior, e.g.:

-- instance 1
CREATE SEQUENCE public.room_id_1_seq AS int INCREMENT BY 2 START WITH 1;
CREATE SEQUENCE public.room_id_2_seq AS int INCREMENT BY 2 START WITH 2;
CREATE TABLE public.room (
id int DEFAULT nextval('room_id_1_seq') PRIMARY KEY,
name text NOT NULL
);

-- instance 2
CREATE SEQUENCE public.room_id_1_seq AS int INCREMENT BY 2 START WITH 1;
CREATE SEQUENCE public.room_id_2_seq AS int INCREMENT BY 2 START WITH 2;
CREATE TABLE public.room (
id int DEFAULT nextval('room_id_2_seq') PRIMARY KEY,
name text NOT NULL
);

After building out [3] this did work, but it was more tedious.

Is it possible to support IDENTITY columns (or serial columns) where the
values of the sequence are set to different intervals on the
publisher/subscriber?

Thanks,

Jonathan

[1]
https://github.com/CrunchyData/postgres-realtime-demo/blob/main/examples/demo/demo1.sql
[2] https://gist.github.com/jkatz/5c34bf1e401b3376dfe8e627fcd30af3
[3] https://gist.github.com/jkatz/1599e467d55abec88ab487d8ac9dc7c3

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-02-22 02:39:15 Re: Add support for unit "B" to pg_size_pretty()
Previous Message Justin Pryzby 2023-02-22 01:50:35 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)