Support logical replication of DDLs

From: Zheng Li <zhengli10(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Support logical replication of DDLs
Date: 2022-02-21 15:53:43
Message-ID: CAAD30U+pVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo=OjdQGJ9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hello,

One of the most frequently requested improvements from our customers
is to reduce downtime associated with software updates (both major and
minor versions). To do this, we have reviewed potential contributions to
improving logical replication.

I’m working on a patch to support logical replication of data
definition language statements (DDLs). This is a useful feature when a
database in logical replication has lots of tables, functions and
other objects that change over time, such as in online cross major
version upgrade.

I put together a prototype that replicates DDLs using the generic
messages for logical decoding. The idea is to log the candidate DDL
string in ProcessUtilitySlow() using LogLogicalMessge() with a new
flag in WAL record type xl_logical_message indicating it’s a DDL
message. The xl_logical_message record is decoded and sent to the
subscriber via pgoutput. The logical replication worker process is
dispatched for this new DDL message type and executes the command
accordingly.

However, there are still many edge cases to sort out because not every
DDL statement can/should be replicated. Some of these include:

1. DDL involving multiple tables where only some tables are replicated, e.g.

DROP TABLE replicated_foo, notreplicated_bar;

This statement will fail on the subscriber and block logical
replication. It can be detected and filtered on the publisher.

2. Any DDL that calls a volatile function, such as NOW() or RAND(), is
likely to generate a different value on each replica. It is possible
to work around these issues—for example, the publisher can replace any
volatile function calls with a fixed return value when the statement
is logged so that the subscribers all get the same value. We will have
to consider some other cases.

3. CREATE TABLE AS and SELECT INTO, For example:

CREATE TABLE foo AS
SELECT field_1, field_2 FROM bar;

There are a few issues that can occur here. For one, it’s possible
that table bar doesn't exist on the subscriber. Even if “bar” does
exist, it may not be fully up-to-date with the publisher, which would
cause a data mismatch on “foo” between the publisher and subscriber.

4. Statements that have nondeterministic side effects (e.g., as caused
by triggers, stored procedures, user-defined functions) may result in
different side effects occurring on each subscriber.

Whether a DDL should be replicated also depends on what granularity do
we define DDL replication. For example, we can define DDL replication
on these levels:

1. Database level
Allows all DDLs for a database to be replicated except for certain
edge cases (refer to the edge cases mentioned above).
This is likely a major use case, such as in online major version upgrade.

2. Table level
Allows DDLs on the published tables to be replicated except for
certain edge cases.
This is useful for consolidating multiple databases into a single one,
e.g. for analytics.

3. Other fine-grained levels base on the object type such as index,
function, procedure and view etc.
Allows DDLs on certain object types to be replicated. At the moment
I’m unsure of a use case for this.

To implement such DDL replication levels, we need to modify the CREATE
PUBLICATION syntax. For example, to help starting the discussion on
the granularity of DDL replication, we can add a new option list ‘ddl’
in the WITH definition:

CREATE PUBLICATION mypub FOR ALL TABLES with
(publish = ‘insert, update, delete, truncate’, ddl = ‘database’)

CREATE PUBLICATION mypub FOR TABLE T1, T2 with
(publish = ‘insert, update, delete, truncate’, ddl = ‘table’)

CREATE PUBLICATION mypub FOR TABLE T1, T2 with
(publish = ‘insert, update, delete, truncate’, ddl = ‘table, function,
procedure’)

We can probably make “ddl = ‘database’” valid only for the FOR ALL
TABLES publication, because it doesn’t really make sense to replicate
all DDLs for a database when only a subset of tables are being
replicated (which can cause edge case 1 to occur frequently). “ddl =
‘database’” + FOR ALL TABLES is likely where logical replication of
DDL is most useful, i.e. for online major versions upgrades.

Based on the DDL publication levels we can further implement the logic
to conditionally log the DDL commands or to conditionally decode/ship
the logical DDL message.

Thoughts? Your feedback is appreciated.

Thanks,
Zheng Li
Amazon RDS/Aurora for PostgreSQL

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Aleksander Alekseev 2022-02-22 10:05:05 Re: Support logical replication of DDLs
Previous Message Daniel Verite 2022-02-21 10:59:55 Re: varchar::bytea fails when varchar contains backslash

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-02-21 16:03:51 Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Previous Message Andrew Dunstan 2022-02-21 15:17:32 Re: Using Test::More test functions for pg_rewind