Re: Support logical replication of DDLs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2023-06-22 10:52:30
Message-ID: CAA4eK1KepPduk2u8pOgYkvQe54_+5G6snSobSN8H5RcJd334TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Jun 13, 2023 at 1:21 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> The patch is made of a lot of one-one mapping between enum structures
> and hardcoded text used in the JSON objects, making it something hard
> to maintain if a node field is added, removed or even updated into
> something else. I have mentioned that during PGCon, but here is
> something more durable: why don't we generate more of this code
> automatically based on the structure of the nodes we are looking at?
>

As far as I understand, the main idea behind the generation of code
based on the structure of node is that in most such cases, we generate
a common functionality based on each structure element (like
"comparison", "copy", "read", "write", or "append to jumble" that
element). There are exceptions to it in some cases in which we deal
with pg_node_attr annotations. However, the deparsing is different in
the sense that in many cases, there is no one-to-one mapping between
elements of structure and DDL's deparse generation.
For example,
1. Annotating fields to access the catalog won't be sufficient, we
need to tell the catalog's field, operator, etc., and writing such
functions for access will vary based on the type of DDL command.
Additionally, we may call some extra functions to get the required
output. See RelationGetPartitionBound. We can probably someway
annotate the field to call particular functions.
2. For part of the DDL creation, we primarily need to rely on catalogs
where no struct field is used. For example, creating identity
(schema+relation name) in CreateStmt, and autogenerating column
information won't seem feasible just by annotating structure, see
deparse_TableElems_ToJsonb and friends. The other example is that when
deparsing the CREATE TABLE command, the persistence/of_type/relkind
need to be fetched from the Relation structure(get from
relation_open()). There are other similar cases.
3. Another challenge is that to allow the elements to be processed in
the correct format, we need to form the statement in a particular
order. So, adding fields in between structures requires a manual
change in the deparse functions. Basically, the current output of
deparser includes a format string that can be used to format the plain
DDL strings by well-defined sprintf-like expansion. The format string
looks like this:

"CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D
(%{table_elements:, }s) %{inherits}s %{partition_by} ..."

The syntax format depends on whether each syntax part is necessary or
not. (For example, for the non-partition table, it doesn't have the
"%{partition_by}" part). So, when deparsing, we need to append each
syntax part to the format string separately and each syntax part(like
%{..}) needs to be generated in the correct order (otherwise, we
cannot expand it to a DDL command). It would be difficult to
automatically generate the format string in the correct order from the
structure members because the structure members' order varies.
4. RangeVar's variable could be appended in one way for "Alter Table"
but another way for "Create Table". When used via AlterTableStmt, we
need it to append ONLY clause whereas we don't need it in CreateStmt
5. IndexStmt is used differently for Alter Subcommands. In
AddIndexConstraint, some of its elements are used for keys whereas it
is not at all used in AddIndex for some assert checks.
6. Then the catalog table is opened once and the required information
is used during the entire deparse of the statement. We may need to
think about that as well.

Having said that, we are still trying to write a patch to see how it
looks, which may help us to jointly evaluate if we can do anything
better.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2023-06-22 13:29:59 Re: Catalog for LISTEN'ed to notification channels?
Previous Message shveta malik 2023-06-22 10:45:19 Re: Support logical replication of DDLs

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2023-06-22 11:07:42 Re: memory leak in trigger handling (since PG12)
Previous Message shveta malik 2023-06-22 10:45:19 Re: Support logical replication of DDLs