Re: Support logical replication of DDLs

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support logical replication of DDLs
Date: 2022-12-13 09:17:32
Message-ID: CALDaNm2CazsD+Vibr8Khd_JXAvb1-FDtqhAaMS34HmdH0D9ATw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, 3 Nov 2022 at 13:11, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some more review comments for the v32-0001 file ddl_deparse.c
>
> (This is a WIP since it is such a large file)
>
> ======
>
> 1. General - calling VA with 0 args
>
> There are some calls to new_objtree_VA() where 0 extra args are passed.
>
> e.g. See in deparse_AlterFunction
> * alterFunc = new_objtree_VA("ALTER FUNCTION", 0);
> * ObjTree *tmpobj = new_objtree_VA("%{type}T", 0);
> * tmpobj = new_objtree_VA(intVal(defel->arg) ? "RETURNS NULL ON NULL
> INPUT" : "CALLED ON NULL INPUT", 0);
> * tmpobj = new_objtree_VA(intVal(defel->arg) ? "SECURITY DEFINER" :
> "SECURITY INVOKER", 0);
> * tmpobj = new_objtree_VA(intVal(defel->arg) ? "LEAKPROOF" : "NOT
> LEAKPROOF", 0);
> * etc.
>
> Shouldn't all those just be calling the new_objtree() function instead
> of new_objtree_VA()?
>
> Probably there are more than just those cited - please search for others.

Modified wherever possible

> 2. General - when to call append_xxx functions?
>
> I did not always understand what seems like an arbitrary choice of
> function calls to append_xxx.
>
> e.g. Function deparse_AlterObjectSchemaStmt() does:
>
> + append_string_object(alterStmt, "%{identity}s", ident);
> +
> + append_string_object(alterStmt, "SET SCHEMA %{newschema}I", newschema);
>
> Why doesn't the above just use new_objtree_VA instead -- it seemed to
> me like the _VA function is underused in some places. Maybe search all
> the append_xxx usage - I suspect many of those can in fact be combined
> to use new_objtree_VA().

Modified wherever possible

> 3. General - extract object names from formats
>
> IIUC the calls to append_XXX__object will call deeper to
> append_object_to_format_string(), which has a main side-effect loop to
> extract the "name" part out of the sub_fmt string. But this logic all
> seems inefficient and unnecessary to me. I think in most (all?) cases
> the caller already knows what the object name should be, so instead of
> making code work to figure it out again, it can just be passed in the
> same way the _VA() function passes the known object name.
>
> There are many cases of this:
>
> e.g.
>
> BEFORE
> append_string_object(alterop, "(%{left_type}s", "NONE");
>
> AFTER - just change the signature to pass the known object name
> append_string_object(alterop, "(%{left_type}s", "left_type", "NONE");

Modified

> 4. General - fwd declares
>
> static void append_array_object(ObjTree *tree, char *name, List *array);
> static void append_bool_object(ObjTree *tree, char *name, bool value);
> static void append_float_object(ObjTree *tree, char *name, float8 value);
> static void append_null_object(ObjTree *tree, char *name);
> static void append_object_object(ObjTree *tree, char *name, ObjTree *value);
> static char *append_object_to_format_string(ObjTree *tree, char *sub_fmt);
> static void append_premade_object(ObjTree *tree, ObjElem *elem);
> static void append_string_object(ObjTree *tree, char *name, char *value);
>
>
> I think those signatures are misleading. AFAIK seems what is called
> the 'name' param above is often a 'sub_fmt' param in the
> implementation.

Modified

> 5. General - inconsistent append_array object calls.
>
> Sometimes enclosing brackets are included as part of the format string
> to be appended and other times they are appended separately. IIUC
> there is no difference but IMO the code should always be consistent to
> avoid it being confusing.
>
> e.g.1 (brackets in fmt)
> append_array_object(tmpobj, "(%{rettypes:, }s)", rettypes);
>
> e.g.2 (brackets appended separately)
> + append_format_string(tmpobj, "(");
> + append_array_object(tmpobj, "%{argtypes:, }T", arglist);
> + append_format_string(tmpobj, ")");

We cannot change it to a single call in all cases because in some
cases there is a possibility that the list can be NULL, if the list is
empty then append_array_object will return without appending "(". For
an empty list, we should append it with (). I'm not making any changes
for this.

> 6. General - missing space before '('
>
> I noticed a number of comment where there is a space missing before a '('.
> Here are some examples:
>
> - * An element of an object tree(ObjTree).
> - * typmod at the middle of name(e.g. TIME(6) with time zone ). We cannot
> - * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or
> - * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or
>
> Search all the patch-code to find others and add missing spaces.

Modified

> 7. General - Verbose syntax comments
>
> Some (but not all) of the deparse_XXX functions have a comment
> describing the verbose syntax.
>
> e.g.
> /*
> * Verbose syntax
> *
> * CREATE %{default}s CONVERSION %{identity}D FOR %{source}L TO %{dest}L
> * FROM %{function}D
> */
>
> These are helpful for understanding the logic of the function, so IMO
> similar comments should be written for *all* of the deparse_xxx
> function.
>
> And maybe a more appropriate place to put these comments is in the
> function header comment.

Modified

> 8. new_objtree_VA
>
> + /*
> + * For all other param types there must be a value in the varargs.
> + * Fetch it and add the fully formed subobject into the main object.
> + */
> + switch (type)
>
> What does the comment mean when it says - for all "other" param types?

I felt the comment meant "For all param types other than ObjTypeNull",
changed it accordingly.

> 9. objtree_to_jsonb_element
>
> + ListCell *cell;
> + JsonbValue val;
>
> The 'cell' is only for the ObjTypeArray so consider declaring it for
> that case only.

Modified

> 10. obtainConstraints
>
> + else
> + {
> + Assert(OidIsValid(domainId));
>
> Looks like the above Assert is unnecessary because the earlier Assert
> (below) already ensures this:
> + /* Only one may be valid */
> + Assert(OidIsValid(relationId) ^ OidIsValid(domainId));

Removed it

> 11. pg_get_indexdef_detailed
>
> + /* Output tablespace */
> + {
> + Oid tblspc;
> +
> + tblspc = get_rel_tablespace(indexrelid);
> + if (OidIsValid(tblspc))
> + *tablespace = pstrdup(quote_identifier(get_tablespace_name(tblspc)));
> + else
> + *tablespace = NULL;
> + }
> +
> + /* Report index predicate, if any */
> + if (!heap_attisnull(ht_idx, Anum_pg_index_indpred, NULL))
> + {
> + Node *node;
> + Datum predDatum;
> + char *predString;
> +
> + /* Convert text string to node tree */
> + predDatum = SysCacheGetAttr(INDEXRELID, ht_idx,
> + Anum_pg_index_indpred, &isnull);
> + Assert(!isnull);
> + predString = TextDatumGetCString(predDatum);
> + node = (Node *) stringToNode(predString);
> + pfree(predString);
> +
> + /* Deparse */
> + *whereClause =
> + deparse_expression(node, context, false, false);
> + }
> + else
> + *whereClause = NULL;
>
> Maybe just assign defaults:
> *tablespace = NULL;
> *whereClause = NULL;
>
> then overwrite those defaults, so can avoid the 'else' code.

Modified

> 12. stringify_objtype
>
> +/*
> + * Return the given object type as a string.
> + */
> +static const char *
> +stringify_objtype(ObjectType objtype)
>
> 12a.
> This statics function feels like it belongs more in another module as
> a utility function.

Moved it

> 12b.
> Actually, this function looks like it might be more appropriate just
> as a static lookup array/map of names keys by the ObjectType, and
> using a StaticAssertDecl for sanity checking.

We will not be able to use static array map in this case as this
function returns multiple types like in case of OBJECT_COLUMN returns
"TABLE" or "COLUMN" based on isgrant is true or false. similarly in
case of OBJECT_FOREIGN_SERVER too. I have not made any changes for
this.

> 13. deparse_GrantStmt
>
> + /*
> + * If there are no objects from "ALL ... IN SCHEMA" to be granted, then we
> + * need not do anything.
> + */
> + if (istmt->objects == NIL)
> + return NULL;
>
> "we need not do anything." -> "nothing to do."

Modified

> 14. deparse_GrantStmt
>
> + switch (istmt->objtype)
> + {
> + case OBJECT_COLUMN:
> + case OBJECT_TABLE:
> + objtype = "TABLE";
> + classId = RelationRelationId;
> + break;
> + case OBJECT_SEQUENCE:
> + objtype = "SEQUENCE";
> + classId = RelationRelationId;
> + break;
> + case OBJECT_DOMAIN:
> + objtype = "DOMAIN";
> + classId = TypeRelationId;
> + break;
> + case OBJECT_FDW:
> + objtype = "FOREIGN DATA WRAPPER";
> + classId = ForeignDataWrapperRelationId;
> + break;
> + case OBJECT_FOREIGN_SERVER:
> + objtype = "FOREIGN SERVER";
> + classId = ForeignServerRelationId;
> + break;
> + case OBJECT_FUNCTION:
> + objtype = "FUNCTION";
> + classId = ProcedureRelationId;
> + break;
> + case OBJECT_PROCEDURE:
> + objtype = "PROCEDURE";
> + classId = ProcedureRelationId;
> + break;
> + case OBJECT_ROUTINE:
> + objtype = "ROUTINE";
> + classId = ProcedureRelationId;
> + break;
> + case OBJECT_LANGUAGE:
> + objtype = "LANGUAGE";
> + classId = LanguageRelationId;
> + break;
> + case OBJECT_LARGEOBJECT:
> + objtype = "LARGE OBJECT";
> + classId = LargeObjectRelationId;
> + break;
> + case OBJECT_SCHEMA:
> + objtype = "SCHEMA";
> + classId = NamespaceRelationId;
> + break;
> + case OBJECT_TYPE:
> + objtype = "TYPE";
> + classId = TypeRelationId;
> + break;
> + case OBJECT_DATABASE:
> + case OBJECT_TABLESPACE:
> + objtype = "";
> + classId = InvalidOid;
> + elog(ERROR, "global objects not supported");
> + break;
> + default:
> + elog(ERROR, "invalid OBJECT value %d", istmt->objtype);
> + }
>
>
> Shouldn't code be calling to the other function stringify_objtype() to
> do some of this?

Modified

> 15.
>
> + grantStmt = new_objtree_VA(fmt, 0);
> +
> + /* build a list of privileges to grant/revoke */
> + if (istmt->all_privs)
> + {
> + tmp = new_objtree_VA("ALL PRIVILEGES", 0);
>
> Here are some more examples of the _VA function being called with 0
> args. Why use _VA function?

This was already modified

> 16.
>
> + list = NIL;
> +
> + if (istmt->privileges & ACL_INSERT)
> + list = lappend(list, new_string_object("INSERT"));
> + if (istmt->privileges & ACL_SELECT)
> + list = lappend(list, new_string_object("SELECT"));
> + if (istmt->privileges & ACL_UPDATE)
> + list = lappend(list, new_string_object("UPDATE"));
> + if (istmt->privileges & ACL_DELETE)
> + list = lappend(list, new_string_object("DELETE"));
> + if (istmt->privileges & ACL_TRUNCATE)
> + list = lappend(list, new_string_object("TRUNCATE"));
> + if (istmt->privileges & ACL_REFERENCES)
> + list = lappend(list, new_string_object("REFERENCES"));
> + if (istmt->privileges & ACL_TRIGGER)
> + list = lappend(list, new_string_object("TRIGGER"));
> + if (istmt->privileges & ACL_EXECUTE)
> + list = lappend(list, new_string_object("EXECUTE"));
> + if (istmt->privileges & ACL_USAGE)
> + list = lappend(list, new_string_object("USAGE"));
> + if (istmt->privileges & ACL_CREATE)
> + list = lappend(list, new_string_object("CREATE"));
> + if (istmt->privileges & ACL_CREATE_TEMP)
> + list = lappend(list, new_string_object("TEMPORARY"));
> + if (istmt->privileges & ACL_CONNECT)
> + list = lappend(list, new_string_object("CONNECT"));
>
> 16a.
> Shouldn't this be trying to re-use code like privilege_to_string()
> mapping function already in aclchk.c to get all those ACL strings?

Modified

> 16b.
> Is it correct that ACL_SET and ACL_ALTER_SYSTEM are missing?

Yes this is intentional as event trigger is not supported for global objects

> 17.
>
> The coding style is inconsistent in this function...
>
> For the same things - sometimes use the ternary operator; sometimes use if/else.
>
> e.g.1
> + append_string_object(grantStmt, "%{grant_option}s",
> + istmt->grant_option ? "WITH GRANT OPTION" : "");
>
> e.g.2
> + if (istmt->behavior == DROP_CASCADE)
> + append_string_object(grantStmt, "%{cascade}s", "CASCADE");
> + else
> + append_string_object(grantStmt, "%{cascade}s", "");

Modified

> 18. deparse_AlterOpFamily
>
> + tmpobj2 = new_objtree_VA("FOR ORDER BY", 0);
> + append_object_object(tmpobj2, "%{opfamily}D",
> + new_objtree_for_qualname_id(OperatorFamilyRelationId,
> + oper->sortfamily));
>
> Why isn't something like this combined to be written as a signle
> new_objtree_VA call?

Modified

> 19. deparse_Type_Storage
>
> + tmpstr = psprintf("%s", str);
> +
> + fmt = "STORAGE = %{value}s";
> +
> + storage = new_objtree_VA(fmt, 2,
> + "clause", ObjTypeString, "storage",
> + "value", ObjTypeString, tmpstr);
>
> 19a.
> What is the purpose of tmpstr? Seems unnecessary

It is not required, removed it

> 19b.
> What is the purpose of separate 'fmt' var? Why not just pass format
> string as a parameter literal to the new_objtree_VA()

Modified

> 20. deparse_CreateConversion
>
> + /* Add the DEFAULT clause */
> + append_string_object(ccStmt, "default",
> + conForm->condefault ? "DEFAULT" : "");
>
> 20a.
> Is that code correct? I thought the fmt should look like
> "%{default}s", otherwise won't the resulting string object have no
> name?

I have changed it to use it in new_objtree_VA and changed it to %{default}s

> 20b.
> Anyway, it does not seem to match what the preceding verbose syntax
> comment says.

Modified

> 21.
>
> +
> +
> + /* Add the DEFAULT clause */
> + append_string_object(ccStmt, "default",
> + conForm->condefault ? "DEFAULT" : "");
> +
> + tmpObj = new_objtree_for_qualname(conForm->connamespace,
> NameStr(conForm->conname));
> + append_object_object(ccStmt, "CONVERSION %{identity}D", tmpObj);
> + append_string_object(ccStmt, "FOR %{source}L", (char *)
> + pg_encoding_to_char(conForm->conforencoding));
> + append_string_object(ccStmt, "TO %{dest}L", (char *)
> + pg_encoding_to_char(conForm->contoencoding));
> + append_object_object(ccStmt, "FROM %{function}D",
> + new_objtree_for_qualname_id(ProcedureRelationId,
> + conForm->conproc));
>
> I don't really understand why all this is not written instead using a
> single new_objtree_VA() call.

Modified

> 22. deparse_CreateEnumStmt
>
> + enumtype = new_objtree("CREATE TYPE");
> + append_object_object(enumtype, "%{identity}D",
> + new_objtree_for_qualname_id(TypeRelationId,
> + objectId));
> +
> + values = NIL;
> + foreach(cell, node->vals)
> + {
> + String *val = lfirst_node(String, cell);
> +
> + values = lappend(values, new_string_object(strVal(val)));
> + }
> +
> + append_array_object(enumtype, "AS ENUM (%{values:, }L)", values);
> + return enumtype;
>
> Ditto. I don't really understand why all this is not written instead
> using a single new_objtree_VA() call.

Modified

> 23. deparse_CreateExtensionStmt
>
> + extStmt = new_objtree("CREATE EXTENSION");
> +
> + append_string_object(extStmt, "%{if_not_exists}s",
> + node->if_not_exists ? "IF NOT EXISTS" : "");
> +
> + append_string_object(extStmt, "%{name}I", node->extname);
>
> Ditto. I don't really understand why all this is not written instead
> using a single new_objtree_VA() call.

Modified

> 24. deparse_FdwOptions
>
> + tmp = new_objtree_VA("OPTIONS", 0);
>
> Isn't it better to call other function instead of passing zero params
> to this one?

This was already fixed

> 25. deparse_CreateFdwStmt
>
> 25a.
> + /* add HANDLER clause */
> + if (fdwForm->fdwhandler == InvalidOid)
> + tmp = new_objtree_VA("NO HANDLER", 0);
> + else
>
> Isn't it better to call other function instead of passing zero params
> to this one?

This is fixed already

> 25b.
> + /* add VALIDATOR clause */
> + if (fdwForm->fdwvalidator == InvalidOid)
> + tmp = new_objtree_VA("NO VALIDATOR", 0);
>
> Ditto #25a

This is fixed already

>
> 25c.
> Both above should use OidIsValid macro.

Modified

> 26. deparse_AlterFdwStmt
>
> 26a.
> + /* add HANDLER clause */
> + if (fdwForm->fdwhandler == InvalidOid)
> + tmp = new_objtree_VA("NO HANDLER", 0);
>
> Ditto #25a

This is fixed already

> 26b.
> + /* add VALIDATOR clause */
> + if (fdwForm->fdwvalidator == InvalidOid)
> + tmp = new_objtree_VA("NO VALIDATOR", 0);
>
> Ditto #25a

This is fixed already

> 26c.
> Both above should use OidIsValid macro.

Modified

> 27. deparse_CreateRangeStmt
>
> + /* SUBTYPE */
> + tmp = new_objtree_for_qualname_id(TypeRelationId,
> + rangeForm->rngsubtype);
> + tmp = new_objtree_VA("SUBTYPE = %{type}D",
> + 2,
> + "clause", ObjTypeString, "subtype",
> + "type", ObjTypeObject, tmp);
> + definition = lappend(definition, new_object_object(tmp));
>
>
> The reusing of 'tmp' variable seems a bit sneaky to me. Perhaps using
> 'tmp' and 'tmp_qualid' might be a more readable way to go here.

Removed usage of tmp

> 28. deparse_AlterEnumStmt
>
> + if (node->newValNeighbor)
> + {
> + append_string_object(alterEnum, "%{after_or_before}s",
> + node->newValIsAfter ? "AFTER" : "BEFORE");
> + append_string_object(alterEnum, "%{neighbour}L", node->newValNeighbor);
> + }
>
> Has a mix of US and UK spelling of neighbor/neighbour?

Modified to neighbor

Thanks for the comments, the attached v49 patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v49-0004-Test-cases-for-DDL-replication.patch text/x-patch 24.6 KB
v49-0001-Functions-to-deparse-DDL-commands.patch text/x-patch 324.9 KB
v49-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch text/x-patch 2.2 KB
v49-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch text/x-patch 15.7 KB
v49-0002-Support-DDL-replication.patch text/x-patch 132.5 KB
v49-0006-Support-DDL-replication-of-alter-type-having-USI.patch text/x-patch 9.0 KB
v49-0007-Introduce-the-test_ddl_deparse_regress-test-modu.patch text/x-patch 46.7 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michel Pelletier 2022-12-13 18:39:19 Is there a way to detect that code is inside CREATE EXTENSION?
Previous Message higherone 2022-12-13 08:28:40 Re: Is there a way to know write statistics on an individual index

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-12-13 10:40:50 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message John Naylor 2022-12-13 08:29:18 Re: New strategies for freezing, advancing relfrozenxid early