Re: Support logical replication of DDLs

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2023-06-27 10:15:48
Message-ID: CALDaNm0Rvv2EWOOQeGc_1j3b3ENceDRO8jd+gbf0Y_J7d1FGGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, 22 Jun 2023 at 16:22, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jun 13, 2023 at 1:21 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > The patch is made of a lot of one-one mapping between enum structures
> > and hardcoded text used in the JSON objects, making it something hard
> > to maintain if a node field is added, removed or even updated into
> > something else. I have mentioned that during PGCon, but here is
> > something more durable: why don't we generate more of this code
> > automatically based on the structure of the nodes we are looking at?
> >
>
> As far as I understand, the main idea behind the generation of code
> based on the structure of node is that in most such cases, we generate
> a common functionality based on each structure element (like
> "comparison", "copy", "read", "write", or "append to jumble" that
> element). There are exceptions to it in some cases in which we deal
> with pg_node_attr annotations. However, the deparsing is different in
> the sense that in many cases, there is no one-to-one mapping between
> elements of structure and DDL's deparse generation.
> For example,
> 1. Annotating fields to access the catalog won't be sufficient, we
> need to tell the catalog's field, operator, etc., and writing such
> functions for access will vary based on the type of DDL command.
> Additionally, we may call some extra functions to get the required
> output. See RelationGetPartitionBound. We can probably someway
> annotate the field to call particular functions.
> 2. For part of the DDL creation, we primarily need to rely on catalogs
> where no struct field is used. For example, creating identity
> (schema+relation name) in CreateStmt, and autogenerating column
> information won't seem feasible just by annotating structure, see
> deparse_TableElems_ToJsonb and friends. The other example is that when
> deparsing the CREATE TABLE command, the persistence/of_type/relkind
> need to be fetched from the Relation structure(get from
> relation_open()). There are other similar cases.
> 3. Another challenge is that to allow the elements to be processed in
> the correct format, we need to form the statement in a particular
> order. So, adding fields in between structures requires a manual
> change in the deparse functions. Basically, the current output of
> deparser includes a format string that can be used to format the plain
> DDL strings by well-defined sprintf-like expansion. The format string
> looks like this:
>
> "CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D
> (%{table_elements:, }s) %{inherits}s %{partition_by} ..."
>
> The syntax format depends on whether each syntax part is necessary or
> not. (For example, for the non-partition table, it doesn't have the
> "%{partition_by}" part). So, when deparsing, we need to append each
> syntax part to the format string separately and each syntax part(like
> %{..}) needs to be generated in the correct order (otherwise, we
> cannot expand it to a DDL command). It would be difficult to
> automatically generate the format string in the correct order from the
> structure members because the structure members' order varies.
> 4. RangeVar's variable could be appended in one way for "Alter Table"
> but another way for "Create Table". When used via AlterTableStmt, we
> need it to append ONLY clause whereas we don't need it in CreateStmt
> 5. IndexStmt is used differently for Alter Subcommands. In
> AddIndexConstraint, some of its elements are used for keys whereas it
> is not at all used in AddIndex for some assert checks.
> 6. Then the catalog table is opened once and the required information
> is used during the entire deparse of the statement. We may need to
> think about that as well.
>
> Having said that, we are still trying to write a patch to see how it
> looks, which may help us to jointly evaluate if we can do anything
> better.

While development, below are some of the challenges we faced:
1. Almost all the members of the AlterTableType enum will have to be annotated.
2. Complex functionalities which require access to catalog tables
cannot be auto generated, custom functions should be written in this
case.
3. Some commands might have completely custom code(no auto generation)
and in the alter/drop table case we will have hybrid implementation
both auto generated and custom implementation.
4. Developer needs to understand the annotation format that must be
specified for different types.
5. During syntax enhancement for the existing options, the developer
should understand if the user should modify to continue using auto
generation or use custom implementation.
6. For CreateStmt, most of the fields need to be handled differently
in gen_node_support.pl, which means we need to maintain some uncommon
C codes in the script. This may make these deparser codes a bit harder
to maintain if anything changes, because users need to search the
script to find which part of C codes need to be changed in the future.
The other alternative was to completely write the custom code for
creating a statement.

Having mentioned the above challenges, we feel that there is a better
chance for developers to notice structure field annotations and handle
the new field deparsing. Also, some of the common code is auto
generated. But overall, it doesn't seem to be advantageous to try auto
generating the deparsing code.

The attached patch has the changes for auto generation for create/drop
and alter commands.
0001 and 0002 patches are the same patches from the earlier patch
posted by Shveta at [1]. I did not post the remaining set of patches
as these patches are sufficient to compare the changes.
0003 patch is a prerequisite patch which does the code re-arrangement
between the files.
0004 patch has the changes for create/drop auto generation
0005 patch has the changes for alter auto generation

Thanks to Hou zj for providing auto generation of create command offline.

[1] - https://www.postgresql.org/message-id/CAJpy0uDLLBYAOzCePYObZ51k1epBU0hef4vbfcujKJprJVsEcQ%40mail.gmail.com

Regards,
Vignesh

Attachment Content-Type Size
0002-Deparser-for-Alter-Table-DDL-commands-2023_06_22.patch text/x-patch 59.1 KB
0001-Deparser-for-Create-And-Drop-Table-DDL-co-2023_06_22.patch text/x-patch 99.4 KB
0004-Auto-generate-CreateStmt-and-DropStmt-node-deparse-f.patch text/x-patch 23.8 KB
0005-POC-for-autogenerating-alter-table-subcommands-for-d.patch text/x-patch 33.6 KB
0003-Code-restructuring-for-supporting-autogeneration-cod.patch text/x-patch 102.4 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tony Shelver 2023-06-27 10:49:00 Re: Large scale reliable software system
Previous Message Mohsin Kazmi 2023-06-27 08:37:41 Re: 2 master 3 standby replication

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-06-27 10:50:56 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message torikoshia 2023-06-27 10:03:21 Re: RFC: Logging plan of the running query