Re: Support logical replication of DDLs

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Dilip Kumar" <dilipbalaut(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zheng Li" <zhengli10(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "rajesh singarapu" <rajesh(dot)rs0541(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2022-04-14 14:15:20
Message-ID: 3c646317-df34-4cb3-9365-14abeada6587@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, Apr 14, 2022, at 6:26 AM, Dilip Kumar wrote:
> I agree. But here the bigger question is what is the correct behavior
> in case of the Alter Table? I mean for example in the publisher the
> table gets rewritten due to the Access Method change then what should
> be the behavior of the subscriber. One expected behavior is that on
> subscriber also the access method gets changed and the data remains
> the same as on the subscriber table(table can be locally rewritten
> based on new AM). Which seems quite sensible behavior to me. But if
> we want this behavior then we can not replay the logical messages
> generated by DML WAL because of table rewrite, otherwise we will get
> duplicate data, unless we plan to get rid of the current data and just
> get all new data from the publisher. And if we do that then the data
> will be as per the latest data in the table based on the publisher, so
> I think first we need to define the correct behavior and then we can
> design it accordingly.
You should forbid it. Unless you can decompose the command into multiple SQL
commands to make it a safe operation for logical replication.

Let's say you want to add a column with a volatile default.

ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();

If you replicate the DDL command as is, you will have different data
downstream. You should forbid it. However, this operation can be supported if
the DDL command is decomposed in multiple steps.

-- add a new column without DEFAULT to avoid rewrite
ALTER TABLE foo ADD COLUMN bar double precision;

-- future rows could use the DEFAULT expression
-- it also doesn't rewrite the table
ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();

-- it effectively rewrites the table
-- all rows are built from one source node
-- data will be the same on all nodes
UPDATE foo SET bar = random();

The ALTER TABLE ... ALTER COLUMN ... TYPE has a similar issue. This DDL command
can be decomposed to avoid the rewrite. If you are changing the data type, in
general, you add a new column and updates all rows doing the proper conversion.
(If you are updating in batches, you usually add a trigger to automatically
adjust the new column value for INSERTs and UPDATEs. Another case is when you
are reducing the the typmod (for example, varchar(100) to varchar(20)). In this
case, the DDL command can de decomposed removing the typmod information (ALTER
TABLE ... ALTER COLUMN ... TYPE varchar) and replacing it with a CHECK
constraint.

I didn't review this patch in depth but we certainly need to impose some DDL
restrictions if we are replicating DDLs. There are other cases that should be
treated accordingly such as a TABLESPACE specification or a custom data type.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jan Beseda 2022-04-14 15:04:23 ERROR: XX000: cache lookup failed for type 75083631
Previous Message Nick Cleaton 2022-04-14 12:31:53 Re: Is this a reasonable use for advisory locks?

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-04-14 14:32:33 Re: Error logging messages
Previous Message Tom Lane 2022-04-14 14:03:58 Re: BufferAlloc: don't take two simultaneous locks