Re: Support logical replication of DDLs

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support logical replication of DDLs
Date: 2022-12-15 16:24:44
Message-ID: CALDaNm3q_2Qfo4XbFkBT8mPz_ALS2MyuR=isP7_8Pz27JnLHiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, 11 Nov 2022 at 10:17, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are more review comments for the v32-0001 file ddl_deparse.c
>
> This completes my first review pass over this overly large file.
>
> This review has taken a long time, so for any of my review comments
> (in this and previous posts) that get rejected, please reply citing
> the rejected reference numbers, because I hope to avoid spending
> multiple days (in a future review) trying to reconcile what was
> addressed vs what was not addressed. TIA.
>
> *** NOTE - my review post became too big, so I split it into smaller parts.
>
> THIS IS PART 1 OF 4.
>
> ======
>
> src/backend/commands/ddl_deparse.c
>
> G.1. GENERAL _VA args wrapping
>
> + tmp = new_objtree_VA("WITH GRANT OPTION",
> + 1, "present", ObjTypeBool,
> + stmt->action->grant_option);
>
> In general, I think all these _VA() style function calls are easier to
> read if you can arrange to put each of the argument names on a new
> line instead of just wrapping them randomly.
>
> So the above would be better as:
>
> tmp = new_objtree_VA("WITH GRANT OPTION", 1,
> "present", ObjTypeBool, stmt->action->grant_option);
>
> Please search/modify all cases of new_objtree_VA like this.

Modified

> G.2. GENERAL - "type" object
>
> There are many functions that insert a "type" object for some purpose:
>
> e.g.
> + tmpobj = new_objtree_VA("DETACH PARTITION %{partition_identity}D FINALIZE", 2,
> + "type", ObjTypeString, "detach partition finalize",
> + "partition_identity", ObjTypeObject,
> + new_objtree_for_qualname_id(RelationRelationId,
> + sub->address.objectId));
>
> e.g.
> + tmpobj = new_objtree_VA(fmtstr, 2,
> + "type", ObjTypeString, "add column",
> + "definition", ObjTypeObject, tree);
>
> I'm not sure yet what these "type" objects are used for, but I felt
> that these unsubstituted values should look slightly more like enum
> values, and slightly less like real SQL syntax.
>
> For example - maybe do like this (for the above):
>
> "detach partition finalize" -> "type_detach_partition_finalize"
> "add column" -> "type_add_column"
> etc.

I felt this is mainly useful for handling when the publisher is
running on a higher version and the subscriber is running on a lower
version, this feature may or may not be part of the first version. We
might be removing this code from the final patch. I have not made any
change for this. We will handle this at a later point based on
handling version is required or not as part of patch to be committed.

> G.3. GENERAL - JSON deparsed structures should be documented
>
> AFAICT there are mixtures of different JSON structure styles at play
> in this module. Sometimes there are trees with names and sometimes
> not, sometimes there are "present" objects and sometimes not.
> Sometimes entire trees seemed unnecessary to me. It feels quite
> arbitrary in places but it's quite hard to compare them because
> everything is spread across 9000+ lines.
>
> IMO all these deparse structures ought to be documented. Then I think
> it will become apparent that lots of them are inconsistent with the
> others. Even if such documentation is ultimately not needed by
> end-users, I think it would be a very valuable internal design
> accompaniment to this module, and it would help a lot for
> reviews/maintenance/bug prevention etc. Better late than never.

There is "type" and "present" which might confuse the user, this is
required for handling when the publisher is running on a higher
version and the subscriber is running on a lower version, this feature
may or may not be part of the first version. We might be removing this
code from the final patch. I have not documented this part, the others
I have documented. Let me know if you are looking for adding comments
for some others particularly.

> G.4 GENERAL - Underuse of _VA() function.
>
> (Probably I've mentioned this before in previous review comments, but
> I keep encountering this many times).
>
> The json is sort of built up part by part and objects are appended ...
> it was probably easier to think about each part during coding but OTOH
> I think this style is often unnecessary. IMO most times the function
> can be far simpler just by gathering together all the necessary values
> and then using a single big new_objtree_VA() call to deparse the
> complete format in one call. I think it could also shave 100s of lines
> of code from the module.

Modified

> G.5 GENERAL - Inconsistent function comment wording.
>
> The function comments are worded in different ways...
>
> "Given a XXX OID and the parse tree that created it, return an ObjTree
> representing the creation command."
>
> versus
>
> "Given a XXX OID and the parse tree that created it, return the JSON
> blob representing the creation command."
>
> Please use consistent wording throughout.

Modified

> G.6 GENERAL - present=false
>
> There are many calls that do like:
> append_bool_object(tmpobj, "present", false);
>
> I was thinking the code would be cleaner if there was a wrapper function like:
>
> static void
> append_not_present(ObjTree objTree)
> {
> append_bool_object(objTree, "present", false);
> }

Modified

> G.7 GENERAL - psprintf format strings
>
> There are quite a few places where the format string is
> pre-constructed using psprintf.
>
> e.g.
> + fmt = psprintf("ALTER %s %%{identity}s OWNER TO %%{newowner}I",
> + stringify_objtype(node->objectType));
> +
> + ownerStmt = new_objtree_VA(fmt, 2,
> + "identity", ObjTypeString,
> + getObjectIdentity(&address, false),
> + "newowner", ObjTypeString,
> + get_rolespec_name(node->newowner));
>
> It's not entirely clear to me why this kind of distinction is even
> made, or even what are the rules governing the choice. AFAICT this
> same result could be achieved by using another string substitution
> marker. So why not do it that way instead of mixing different styles?
>
> IMO many/most of the psprintf can be removed.
>
> e.g. I mean something like this (for the above example):
>
> fmt = "ALTER %{obj_type}s %{identity}s OWNER TO %{newowner}I";
>
> ownerStmt = new_objtree_VA(fmt, 3,
> "obj_type", ObjTypeString, stringify_objtype(node->objectType),
> "identity", ObjTypeString, getObjectIdentity(&address, false),
> "newowner", ObjTypeString, get_rolespec_name(node->newowner));

Modified wherever possible

> G.8 GENERAL - Inconsistent OID/oid in error messages.
>
> errmsg("role with OID %u does not exist", roleoid)));
> elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failed for schema with OID %u",
> errmsg("role with OID %u does not exist", istmt->grantor_uid)));
> elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
> elog(ERROR, "cache lookup failed for type with OID %u", objectId);
> elog(ERROR, "cache lookup failed for conversion with OID %u", objectId);
> elog(ERROR, "cache lookup failed for extension with OID %u",
> elog(ERROR, "cache lookup failed for extension with OID %u",
> elog(ERROR, "cache lookup failed for cast with OID %u", objectId);
> elog(ERROR, "cache lookup failed for domain with OID %u", objectId);
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failure for language with OID %u",
> elog(ERROR, "null prosrc in function with OID %u", objectId);
> elog(ERROR, "cache lookup failed for opclass with OID %u", opcoid);
> elog(ERROR, "cache lookup failed for operator family with OID %u",
> opcForm->opcfamily);
> elog(ERROR, "cache lookup failed for operator family with OID %u", objectId);
> elog(ERROR, "cache lookup failed for domain with OID %u",
> elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
> elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
> elog(ERROR, "cache lookup failed for type with OID %u", objectId);
> elog(ERROR, "cache lookup failed for text search parser with OID %u",
> elog(ERROR, "cache lookup failed for text search dictionary " "with
> OID %u", objectId);
> elog(ERROR, "cache lookup failed for text search template with OID %u",
> elog(ERROR, "cache lookup failed for text search dictionary " "with
> OID %u", objectId);
> elog(ERROR, "cache lookup failed for opclass with OID %u",
> elog(ERROR, "cache lookup failed for operator family with OID %u",
> elog(ERROR, "cache lookup failure for transform with OID %u",
> elog(ERROR, "cache lookup failure for language with OID %u",
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failed for rewrite rule for view with OID
> %u", viewoid)
>
> elog(ERROR, "cache lookup failed for range with type oid %u",
> elog(ERROR, "cache lookup failed for rewrite rule with oid %u",
>
> G.8a.
> Most are uppercase 'OID'. A few are lowercase 'oid'

Modified

> G.8b.
> There is a mixture of "cache lookup failed" and "cache lookup failure"
> -- should all be the same.

Modified

> G.8c.
> A few above (e.g. role) have a different message text. Shouldn't those
> also be "cache lookup failed"?

Modified

> G.9 GENERAL - ObjTree variables
>
> Often the ObjTree variable (for the deparse_XXX function return) is
> given the name of the statement it is creating.
>
> Although it is good to be descriptive, often there is no need for long
> variable names (e.g. 'createTransform' etc), because there is no
> ambiguity anyway and it just makes for extra code characters and
> unnecessary wrapping. IMO it would be better to just call everything
> some short but *consistent* name across every function -- like 'stmt'
> or 'json_ddl' or 'root' or 'ret' ... or whatever.

Modified

Thanks for the comments, the attached v52 patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v52-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch text/x-patch 2.2 KB
v52-0004-Test-cases-for-DDL-replication.patch text/x-patch 24.6 KB
v52-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch text/x-patch 15.7 KB
v52-0001-Functions-to-deparse-DDL-commands.patch text/x-patch 323.0 KB
v52-0002-Support-DDL-replication.patch text/x-patch 132.5 KB
v52-0006-Support-DDL-replication-of-alter-type-having-USI.patch text/x-patch 9.0 KB
v52-0007-Introduce-the-test_ddl_deparse_regress-test-modu.patch text/x-patch 64.9 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michel Pelletier 2022-12-15 16:54:05 pg_dumpall renders ALTER TABLE for a view?
Previous Message Willy-Bas Loos 2022-12-15 16:20:58 Re: tcp keepalives not sent during long query

Browse pgsql-hackers by date

  From Date Subject
Next Message James Finnerty 2022-12-15 16:34:35 Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Previous Message David G. Johnston 2022-12-15 16:18:04 Re: plpgsq_plugin's stmt_end() is not called when an error is caught