Initial Schema Sync for Logical Replication

From: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Initial Schema Sync for Logical Replication
Date: 2023-03-15 17:42:32
Message-ID: db02e6773adb4dbcb5b9bb3803ebe340@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Everyone,

I am working on the initial schema sync for Logical replication. Currently, user have to
manually create a schema on subscriber side. Aim of this feature is to add an option in
create subscription, so that schema sync can be automatic. I am sharing Design Doc below,
but there are some corner cases where the design does not work. Please share your opinion
if design can be improved and we can get rid of corner cases. This design is loosely based
on Pglogical.
DDL replication is required for this feature.
(https://www.postgresql.org/message-id/flat/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com)

SQL Changes:-
CREATE SUBSCRIPTION subscription_name
CONNECTION 'conninfo'
PUBLICATION publication_name [, ...]
[ WITH ( subscription_parameter [= value] [, ... ] ) ]
sync_initial_schema (enum) will be added to subscription_parameter.
It can have 3 values:-
TABLES, ALL , NONE (Default)
In ALL everything will be synced including global objects too.

Restrictions :- sync_initial_schema=ALL can only be used for publication with FOR ALL TABLES

Design:-

Publisher :-
Publisher have to implement `SHOW CREATE TABLE_NAME`, this table definition will be used by
subscriber to create exact schema of a table on the subscriber. One alternative to this can
be doing it on the subscriber side itself, we can create a function similar to
describeOneTableDetails and call it on the subscriber. We also need maintain same ownership
as of publisher.

It should also have turned on publication of DDL commands.

Subscriber :-

1. In CreateSubscription() when we create replication slot(walrcv_create_slot()), should
use CRS_EXPORT_SNAPSHOT, So that we can use this snapshot later in the pg_dump.

2. Now we can call pg_dump with above snapshot from CreateSubscription. This is inside
opts.connect && opts.create_slot if statement. If we fail in this step we have to drop
the replication slot and create a new one again. Because we need snapshot and creating a
replication slot is a way to get snapshot. The reason for running pg_dump with above
snapshot is that we don't want execute DDLs in wal_logs to 2 times. With above snapshot we
get a state of database which is before the replication slot origin and any changes after
the snapshot will be in wal_logs.

We will save the pg_dump into a file (custom archive format). So pg_dump will be similar to
pg_dump --connection_string --schema_only --snapshot=xyz -Fc --file initSchema

If sync_initial_schema=TABLES we dont have to call pg_dump/restore at all. TableSync process
will take care of it.

3. If we have to sync global objects we need to call pg_dumpall --globals-only also. But pg_dumpall
does not support --snapshot option, So if user creates a new global object between creation
of replication slot and running pg_dumpall, that above global object will be created 2
times on subscriber , which will error out the Applier process.

4. walrcv_disconnect should be called after pg_dump is finished, otherwise snapshot will
not be valid.

5. Users will replication role cant not call pg_dump , So the replication user have to
superuser. This is a a major problem.
postgres=# create role s4 WITH LOGIN Replication;
CREATE ROLE
╭─sachin(at)DUB-1800550165 ~
╰─$ pg_dump postgres -s -U s4 1 ↵
pg_dump: error: query failed: ERROR: permission denied for table t1
pg_dump: detail: Query was: LOCK TABLE public.t1, public.t2 IN ACCESS SHARE MODE

6. pg_subscription_rel table column srsubstate will have one more state
SUBREL_STATE_CREATE 'c'. if sync_initial_schema is enabled we will set table_state to 'c'.
Above 6 steps will be done even if subscription is not enabled, but connect is true.

7. Leader Applier process should check if initSync file exist , if true then it should
call pg_restore. We are not using —pre-data and —post-data segment as it is used in
Pglogical, Because post_data works on table having data , but we will fill the data into
table on later stages. pg_restore can be called like this

pg_restore --connection_string -1 file_name
-1 option will execute every command inside of one transaction. If there is any error
everything will be rollbacked.
pg_restore should be called quite early in the Applier process code, before any tablesync
process can be created.
Instead of checking if file exist maybe pg_subscription table can be extended with column
SyncInitialSchema and applier process will check SyncInitialSchema == SYNC_PENDING

8. TableSync process should check the state of table , if it is SUBREL_STATE_CREATE it should
get the latest definition from the publisher and recreate the table. (We have to recreate
the table even if there are no changes). Then it should go into copy table mode as usual.

It might seem that TableSync is doing duplicate work already done by pg_restore. We are doing
it in this way because of concurrent DDLs and refresh publication command.

Concurrent DDL :-
User can execute a DDL command to table t1 at the same time when subscriber is trying to sync
it. pictorial representation https://imgur.com/a/ivrIEv8 [1]

In tablesync process, it makes a connection to the publisher and it sees the
table state which can be in future wrt to the publisher, which can introduce conflicts.
For example:-

CASE 1:- { Publisher removed the column b from the table t1 when subscriber was doing pg_restore
(or any point in concurrent DDL window described in picture [1] ), when tableSync
process will start transaction on the publisher it will see request data of table t1
including column b, which does not exist on the publisher.} So that is why tableSync process
asks for the latest definition.
If we say that we will delay tableSync worker till all the DDL related to table t1 is
applied by the applier process , we can still have a window when publisher issues a DDL
command just before tableSync starts its transaction, and therefore making tableSync and
publisher table definition incompatible (Thanks to Masahiko for pointing out this race
condition).

Applier process will skip all DDL/DMLs related to the table t1 and tableSync will apply those
in Catchup phase.
Although there is one issue what will happen to views/ or functions which depend on the table
. I think they should wait till table_state is > SUBREL_STATE_CREATE (means we have the latest
schema definition from the publisher).
There might be corner cases to this approach or maybe a better way to handle concurrent DDL
One simple solution might be to disallow DDLs on the publisher till all the schema is
synced and all tables have state >= SUBREL_STATE_DATASYNC (We can have CASE 1: issue ,
even with DDL replication, so we have to wait till all the tables have table_state
> SUBREL_STATE_DATASYNC). Which might be a big window for big databases.

Refresh publication :-
In refresh publication, subscriber does create a new replication slot hence , we can’t run
pg_dump with a snapshot which starts from origin(maybe this is not an issue at all). In this case
it makes more sense for tableSync worker to do schema sync.

If community is happy with above design, I can start working on prototype.

Credits :- This design is inspired by Pglogical. Also thanks to Zane, Masahiko, Amit for reviewing earlier designs

Regards
Sachin Kumar
Amazon Web Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark (as CFM) 2023-03-15 17:44:41 Re: On login trigger: take three
Previous Message Alexander Lakhin 2023-03-15 17:00:00 Re: Add support for DEFAULT specification in COPY FROM