Re: Support logical replication of DDLs

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2023-05-05 10:09:33
Message-ID: CAHut+Pv9vPbUQc0fzrKmDkKOsS_bj-hup_E+sLHNEX+6F+SY5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I revisited the 0005 patch to verify all changes made to address my
previous review comments [1][2][3][4] were OK.

Not all changes were made quite as expected, and there were a few
other things I noticed in passing.

======

1. General

I previously [1] wrote a comment:
Use consistent uppercase for JSON and DDL instead of sometimes json
and ddl. There are quite a few random examples in the commit message
but might be worth searching the entire patch to make all comments
also consistent case.

Now I still find a number of lowercase "json" and "ddl" strings.

~~~

2.

Some of my previous review comments were replied [5] as "Will add this
in a later version"... or "Keeping this same for now".

IMO it would be much better to add a "TODO" or a "FIXME" comment
directly into the source for anything described like that, otherwise
the list of things "to do later" is just going to get misplaced in all
the long thread posts, and/or other reviews are going to report
exactly the same missing stuff again later. This review comment
applies also to PG DOCS which are known as incomplete or under
discussion - I think there should be a TODO/FIXME in those SGML files
or the Commit message.

e.g. [1]#9b
e.g. [2]#12abc
e.g. [2]#13c
e.g. [2]#14b
e.g. [2]#15ab
e.g. [2]#17
e.g. [3] (table excessively long)
e.g. [4]#2
e.g. [4]#11

~~~

3. Commit message

Executing a non-immutable expression during the table rewrite phase is not
allowed, as it may result in different data between publisher and subscriber.
While some may suggest converting the rewrite inserts to updates and replicate
them afte the ddl command to maintain data consistency. But it doesn't work if
the replica identity column is altered in the command. This is because the
rewrite inserts do not contain the old values and therefore cannot be converted
to update.

~

3a.
Grammar and typo need fixing for "While some may suggest converting
the rewrite inserts to updates and replicate them afte the ddl command
to maintain data consistency. But it doesn't work if the replica
identity column is altered in the command."

~

3b.
"rewrite inserts to updates"
Consider using uppercase for the INSERTs and UPDATEs

~~~

4.
LIMIT:

--> LIMITATIONS: (??)

======
contrib/test_decoding/sql/ddl.sql

5.
+SELECT 'ddl msg2' FROM pg_logical_emit_ddl_message('ddl msg2', 16394,
1, '{"fmt": "CREATE SCHEMA %{if_not_exists}s %{name}I
%{authorization}s", "name": "foo", "authorization": {"fmt":
"AUTHORIZATION %{authorization_role}I", "present": false,
"authorization_role": null}, "if_not_exists": ""}');

Previously ([4]#1) I had asked what is the point of setting a JSON
payload here when the JSON payload is never used. You might as well
pass the string "banana" to achieve the same thing AFAICT. I think the
reply [5] to the question was wrong. If this faked JSON is used for
some good reason then there ought to be a test comment to say the
reason. Otherwise, the fake JSON just confuses the purpose of this
test so it should be removed/simplified.

======
contrib/test_decoding/test_decoding.c

6. pg_decode_ddl_message

Previously ([4] #4b) I asked if it was necessary to use
appendBinaryStringInfo, instead of just appendStringInfo. I guess it
doesn't matter much, but I think the question was not answered.

======
doc/src/sgml/catalogs.sgml

7.
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>evtisinternal</structfield> <type>char</type>
+ </para>
+ <para>
+ True if the event trigger is internally generated.
+ </para></entry>
+ </row>

Why was this called a 'char' type instead of a 'bool' type?

======
doc/src/sgml/logical-replication.sgml

8.
+ <para>
+ For example, a CREATE TABLE command executed on the publisher gets
+ WAL-logged, and forwarded to the subscriber to replay; a subsequent "ALTER
+ SUBSCRIPTION ... REFRESH PUBLICATION" is performed on the
subscriber database so any
+ following DML changes on the new table can be replicated.
+ </para>

In my previous review comments ([2] 11b) I suggested for this to say
"then an implicit ALTER..." instead of "a subsequent ALTER...". I
think the "implicit" part got accidentally missed.

~~~

9.
+ <listitem>
+ <para>
+ In <literal>ADD COLUMN ... DEFAULT</literal> clause and
+ <literal>ALTER COLUMN TYPE</literal> clause of <command>ALTER
+ TABLE</command> command, the functions and operators used in
+ expression must be immutable.
+ </para>
+ </listitem>

IMO this is hard to read. It might be easier if expressed as 2
separate bullet points.

SUGGESTION
For ALTER TABLE ... ADD COLUMN ... DEFAULT, the functions and
operators used in expressions must be immutable.

For ALTER TABLE ... ADD COLUMN TYPE, the functions and operators used
in expressions must be immutable.

~~~

10.
+ <para>
+ To change the column type, first add a new column of the desired
+ type, then update the new column value with the old column value,
+ and finnally drop the old column and rename the new column to the
+ old column.
+ </para>

/finnally/finally/

======
.../access/rmgrdesc/logicalddlmsgdesc.c

11. logicalddlmsg_desc

I previously wrote some suggestions about improving the Assert in this
code (see [3]#2). But, the reply [5] "The array index is already
length + 1 as indices start from 0." did not make sense, because I am
not saying the code has wrong indices. I am only saying the way the
Asserts are done was inconsistent with other similar MESSAGE msg, and
IMO there is a more intuitive way to assert that the DDL Message has
got some payload in it.

======
src/backend/catalog/pg_publication.c

12.
pub->pubactions.pubinsert = pubform->pubinsert;
pub->pubactions.pubupdate = pubform->pubupdate;
pub->pubactions.pubdelete = pubform->pubdelete;
+ pub->pubactions.pubddl_table = pubform->pubddl_table;
pub->pubactions.pubtruncate = pubform->pubtruncate;
pub->pubviaroot = pubform->pubviaroot;

IMO all the insert/update/delete/truncate belong together because they
all came from the 'publish' parameter. I don't think pubddl_table just
be jammed into the middle of them.

======
src/backend/commands/event_trigger.c

13.
static Oid insert_event_trigger_tuple(const char *trigname, const
char *eventname,
- Oid evtOwner, Oid funcoid, List *taglist);
+ Oid evtOwner, Oid funcoid, List *taglist, bool isinternal);

/isinternal/is_internal/

~~~

14. CreateEventTrigger

* Create an event trigger.
*/
Oid
-CreateEventTrigger(CreateEventTrigStmt *stmt)
+CreateEventTrigger(CreateEventTrigStmt *stmt, bool isinternal)

/isinternal/is_internal/

~~~

15. insert_event_trigger_tuple

/* Insert catalog entries. */
return insert_event_trigger_tuple(stmt->trigname, stmt->eventname,
- evtowner, funcoid, tags);
+ evtowner, funcoid, tags, isinternal);

/isinternal/is_internal/

~~~

16.
if (filter_event_trigger(tag, item))
{
- /* We must plan to fire this trigger. */
- runlist = lappend_oid(runlist, item->fnoid);
+ static const char *trigger_func_prefix = "publication_deparse_%s";
+ char trigger_func_name[NAMEDATALEN];
+ Oid pub_funcoid;
+ List *pubfuncname;
+
+ /* Get function oid of the publication's ddl deparse event trigger */
+ snprintf(trigger_func_name, sizeof(trigger_func_name), trigger_func_prefix,
+ eventstr);
+ pubfuncname = SystemFuncName(trigger_func_name);
+ pub_funcoid = LookupFuncName(pubfuncname, 0, NULL, true);
+
+ if (item->fnoid != pub_funcoid)
+ runlist = lappend_oid(runlist, item->fnoid);
+ else
+ {
+ /* Only the first ddl deparse event trigger needs to be invoked */
+ if (pub_deparse_func_cnt++ == 0)
+ runlist = lappend_oid(runlist, item->fnoid);
+ }

16a.
I somehow felt this logic would be more natural/readable if the check
was for == pub_funcoid instead of != pub_funcoid.

~

16b.
Maybe use /pubfuncname/pub_funcname/ for consistent variable naming.

======
src/backend/commands/publicationcmds.c

17. DropDDLReplicaEventTriggers

+static void
+DropDDLReplicaEventTriggers(Oid puboid)
+{
+ DropDDLReplicaEventTrigger(PUB_TRIG_DDL_CMD_START, puboid);
+ DropDDLReplicaEventTrigger(PUB_TRIG_DDL_CMD_END, puboid);
+ DropDDLReplicaEventTrigger(PUB_TRIG_TBL_REWRITE, puboid);
+ DropDDLReplicaEventTrigger(PUB_TRIG_TBL_INIT_WRITE, puboid);
+}
+
+

Double blank lines.

======
src/backend/replication/logical/ddltrigger.c

18.
+/*
+ * Check if the command can be publishable.
+ *
+ * XXX Executing a non-immutable expression during the table rewrite phase is
+ * not allowed, as it may result in different data between publisher and
+ * subscriber. While some may suggest converting the rewrite inserts to updates
+ * and replicate them after the ddl command to maintain data
consistency. But it
+ * doesn't work if the replica identity column is altered in the command. This
+ * is because the rewrite inserts do not contain the old values and therefore
+ * cannot be converted to update.
+ *
+ * Apart from that, commands contain volatile functions are not
allowed. Because
+ * it's possible the functions contain DDL/DML in which case these operations
+ * will be executed twice and cause duplicate data. In addition, we don't know
+ * whether the tables being accessed by these DDL/DML are published or not. So
+ * blindly allowing such functions can allow unintended clauses like the tables
+ * accessed in those functions may not even exist on the subscriber.
+ */
+static void
+check_command_publishable(ddl_deparse_context context, bool is_rewrite)

18a.
"can be publishable" --> "can be published"

~

18b.
While some may suggest converting the rewrite inserts to updates and
replicate them after the ddl command to maintain data consistency. But
it doesn't work if the replica identity column is altered in the
command.

Grammar? Why is this split into 2 sentences?

~

18c.
Apart from that, commands contain volatile functions are not allowed.

/contain/containing/

~~~

19. check_command_publishable

+ if (context.func_volatile == PROVOLATILE_VOLATILE)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use volatile function in ALTER TABLE command because
it cannot be replicated in DDL replication"));
+}

Is it correct for this message to name ALTER TABLE when that is not
even part of the check? Is that the only scenario where this is
possible to occur?

~~~

20. publication_deparse_ddl_command_end

+ /* handle drop commands which appear in the SQLDropList */
+ slist_foreach(iter, &(currentEventTriggerState->SQLDropList))
+ {
+ SQLDropObject *obj;
+ EventTriggerData *trigdata;
+ char *command;
+ DeparsedCommandType cmdtype;
+
+ trigdata = (EventTriggerData *) fcinfo->context;
+
+ obj = slist_container(SQLDropObject, next, iter.cur);
+
+ if (!obj->original)
+ continue;
+
+ if (strcmp(obj->objecttype, "table") == 0)
+ cmdtype = DCT_TableDropEnd;
+ else
+ continue;
+
+ command = deparse_drop_command(obj->objidentity, obj->objecttype,
+ trigdata->parsetree);
+
+ if (command)
+ LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
+ command, strlen(command) + 1);
+ }

20a.
Uppercase the comment.

~

20b.
Also, it's not a very good comment -- because not giving any more
information than the line of code; can you give a more detailed
explanation?

~

20c.
The way the continues are arranged seems a bit strange. Since this is
all DROP code wouldn't it make more sense to write it like this:

BEFORE
+ if (strcmp(obj->objecttype, "table") == 0)
+ cmdtype = DCT_TableDropEnd;
+ else
+ continue;
+
+ command = deparse_drop_command(obj->objidentity, obj->objecttype,
+ trigdata->parsetree);
+
+ if (command)
+ LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
+ command, strlen(command) + 1);

AFTER
if (strcmp(obj->objecttype, "table") == 0)
{
DeparsedCommandType cmdtype = DCT_TableDropEnd;
char *command;

command = deparse_drop_command(obj->objidentity, obj->objecttype,
trigdata->parsetree);
if (command)
LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
command, strlen(command) + 1);
}

~~~~

21. publication_deparse_table_init_write

+ }
+ return PointerGetDatum(NULL);

Add a blank line before the return;

======
.../replication/logical/reorderbuffer.c

22. ReorderBufferRestoreChange

+ /* read prefix */
+ memcpy(&prefix_size, data, sizeof(Size));
+ data += sizeof(Size);
+ memcpy(&change->data.ddl.relid, data, sizeof(Oid));
+ data += sizeof(Oid);
+ memcpy(&change->data.ddl.cmdtype, data, sizeof(DeparsedCommandType));
+ data += sizeof(int);
+ change->data.ddl.prefix = MemoryContextAlloc(rb->context, prefix_size);
+ memcpy(change->data.ddl.prefix, data, prefix_size);
+ Assert(change->data.ddl.prefix[prefix_size - 1] == '\0');
+ data += prefix_size;

I had suggested before ([3] #23) that it would be better to use:
data += sizeof(DeparsedCommandType);

instead of:
data += sizeof(int);

You already changed this OK in another place but this instance got
accidentally missed.

======
src/backend/replication/logical/worker.c

23. preprocess_create_table

+ if (castmt->objtype == OBJECT_TABLE)
+ {
+ /*
+ * Force skipping data population to avoid data
+ * inconsistency. Data should be replicated from the
+ * publisher instead.
+ */
+ castmt->into->skipData = true;
+ }

I had suggested before ([4] #16b) that the "Force skipping" comments
are not necessary because the function header comment already says the
same thing. One of the "Force skipping" comments was removed OK, but
there is still one more remaining that should be removed.

~~~

24. postprocess_ddl_create_table

+ commandTag = CreateCommandTag((Node *) command);
+ cstmt = (CreateStmt *) command->stmt;
+ rv = cstmt->relation;
+
+ if (commandTag != CMDTAG_CREATE_TABLE)
+ return;
+
+ cstmt = (CreateStmt *) command->stmt;
+ rv = cstmt->relation;
+ if (!rv)
+ return;

This code is still flawed as previously described (see [4]#18). There
are duplicate assignments of 'cstmt' and 'rv'.

~~~

25. apply_handle_ddl

+/*
+ * Handle DDL replication messages. Convert the json string into a query
+ * string and run it through the query portal.
+ */
+static void
+apply_handle_ddl(StringInfo s)

IMO for consistency this should use the same style as the other
function comments. So after the first sentence, put a more detailed
description after a blank line.

~~~

26. apply_handle_ddl

I previously ([4]#21) asked a questio:
There seems to be an assumption here that the only kind of command
processed here would be TABLE related. Maybe that is currently true,
but shouldn't there be some error checking just to make sure it cannot
execute unexpected commands?

~

IMO this question remains relevant -- I think this ddl code needs some
kind of additional guards/checks in it otherwise it will attempt to
deparse commands that it does not understand (e.g. imagine a later
version publisher which supports more DDL than the subscriber does).

======
src/backend/replication/pgoutput/pgoutput.c

27. PGOutputTxnData

typedef struct PGOutputTxnData
{
bool sent_begin_txn; /* flag indicating whether BEGIN has been sent */
+ List *deleted_relids; /* maintain list of deleted table oids */
} PGOutputTxnData;

Actually, from my previous review (see [4]#22) I meant for this to be
a more detailed *structure* level comment to say why this is necessary
even to have this member; not just a basic field comment like what has
been added.

~~~

28. is_object_published

+ /*
+ * Only send this ddl if we don't publish ddl message or the ddl
+ * need to be published via its root relation.
+ */
+ if (relentry->pubactions.pubddl_table &&
+ relentry->publish_as_relid == objid)
+ return true;

The comment seems wrong/confused – "Only send this ddl if we don't
publish ddl message" (??)

======
src/bin/pg_dump/pg_dump.c

29. getEventTriggers

+ /* skip internally created event triggers by checking evtisinternal */
appendPQExpBufferStr(query,
"SELECT e.tableoid, e.oid, evtname, evtenabled, "
"evtevent, evtowner, "

Uppercase the comment.

======
src/bin/pg_dump/t/002_pg_dump.pl

30.
A reply to my comment ([1]9b) about missing test cases says "test case
are in later patches". In general, I don't think it is a good idea to
separate the test cases from the functionality. Eventually, they must
be merged anyway because it would be wrong to push untested code to
HEADe. So why not include all the relevant test cases here and now? Or
at least put a "FIXME" in this patch so you know there is a test case
that *must* be handled in a subsequent patch, otherwise, this
information will be lost.

======
src/bin/psql/describe.c

31. listPublications

@@ -6210,6 +6210,10 @@ listPublications(const char *pattern)
gettext_noop("Inserts"),
gettext_noop("Updates"),
gettext_noop("Deletes"));
+ if (pset.sversion >= 160000)
+ appendPQExpBuffer(&buf,
+ ",\n pubddl_table AS \"%s\"",
+ gettext_noop("Table DDLs"));
if (pset.sversion >= 110000)
IMO the order of the columns is wrong. I think the
insert/update/delete/truncate should all be grouped together because
they are all determined by the same publication parameter ('publish').

~~~

32. describePublications

" puballtables, pubinsert, pubupdate, pubdelete");
+ if (has_pubddl)
+ appendPQExpBufferStr(&buf,
+ ", pubddl_table");
if (has_pubtruncate)
appendPQExpBufferStr(&buf,

Same as previous comment about the column order

======
src/include/catalog/pg_event_trigger.h

33.
@@ -36,7 +36,7 @@ CATALOG(pg_event_trigger,3466,EventTriggerRelationId)
* called */
char evtenabled; /* trigger's firing configuration WRT
* session_replication_role */
-
+ bool evtisinternal; /* trigger is system-generated */
#ifdef CATALOG_VARLEN
text evttags[1]; /* command TAGs this event trigger targets */
#endif

~

This change should not remove the blank line that previously existed
before the #ifdef CATALOG_VARLEN.

======
src/include/catalog/pg_publication.

34.
+/* Publication trigger events */
+#define PUB_TRIG_DDL_CMD_START "ddl_command_start"
+#define PUB_TRIG_DDL_CMD_END "ddl_command_end"
+#define PUB_TRIG_TBL_REWRITE "table_rewrite"
+#define PUB_TRIG_TBL_INIT_WRITE "table_init_write"

Elsewhere in PG15 code there are already hardcoded literal strings for
these triggers, so I am wondering if these constants should really be
defined in some common place where everybody can make use of them
instead of having a mixture of string literals and macros for the same
strings.

======
src/include/commands/event_trigger.h

35.
-extern Oid CreateEventTrigger(CreateEventTrigStmt *stmt);
+extern Oid CreateEventTrigger(CreateEventTrigStmt *stmt, bool isinternal);
extern Oid get_event_trigger_oid(const char *trigname, bool missing_ok);

IMO a better name is 'is_internal' (Using a snake-case name matches
like the other 'missing_ok')

======
src/include/replication/ddlmessage.h

36.
+ * Copyright (c) 2022, PostgreSQL Global Development Group

Copyright for the new file should be 2023?

======
src/include/tcop/ddldeparse.h

37.
* ddldeparse.h
*
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* src/include/tcop/ddldeparse.h

~

I think this is a new file for the feature so why is the copyright
talking about old dates like 1994,1996 etc?

======
src/test/regress/expected/psql.out

38.
IMO the column output is not in a good order. See review comments for describe.c

======
src/test/regress/expected/publication.out

39.
I previously posted a review comment about some missing test cases
([1] #21). The reply was "they will be added in later patches". I
think it's better to put the relevant tests in the same patch that
introduced the functionality.

------
My previous review posts for this patch
[1] https://www.postgresql.org/message-id/CAHut%2BPtzpuuRFrLnjkQePq296ip_0WfmQ4CtydM9JDR6gEf%3DQw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPtMkVoweJrd%3DSLw7BfpW883skasdnemoj4N19NnyjrT3Q%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAHut+PuG8J8uA5V-F-o4TczhvFSWGG1B8qL+EZO0HjWWEEYG+g@mail.gmail.com
[4] https://www.postgresql.org/message-id/CAHut%2BPtOODRybaptKRKUWZnGw-PZuLF2BxaitnMSNeAiU8-yPg%40mail.gmail.com

[5] Houz replies to my previous review comments -
https://www.postgresql.org/message-id/OS0PR01MB571690B2DB46B1C4AF61D184946F9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-general by date

  From Date Subject
Next Message jian he 2023-05-05 11:03:24 customized function to Reconstructs the creating command for a role.
Previous Message Evgeny Morozov 2023-05-05 09:12:01 Re: "PANIC: could not open critical system index 2662" - twice

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-05-05 10:32:33 Re: Add two missing tests in 035_standby_logical_decoding.pl
Previous Message gkokolatos 2023-05-05 10:02:12 Re: Add LZ4 compression in pg_dump