Re: Support logical replication of DDLs

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2023-06-09 09:51:14
Message-ID: CAFPTHDZ-fVFbD5eFHdx=13K8Tk5hoE57EaRYhGBVt117kQpX4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, Jun 8, 2023 at 10:02 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> Please find new set of patches addressing below:
> a) Issue mentioned by Wang-san in [1],
> b) Comments from Peter given in [2]
> c) Comments from Amit given in the last 2 emails.
>
> [1]: https://www.postgresql.org/message-id/OS3PR01MB62750D43D4F7F075B33BD2609E52A%40OS3PR01MB6275.jpnprd01.prod.outlook.com
> [2]: https://www.postgresql.org/message-id/CAHut%2BPv9vPbUQc0fzrKmDkKOsS_bj-hup_E%2BsLHNEX%2B6F%2BSY5Q%40mail.gmail.com
>
> Thank You Vignesh for handling (a), Ajin for handling (b), Shi-san and
> Hou-san for contributing in (c).

On Fri, May 5, 2023 at 8:10 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> I revisited the 0005 patch to verify all changes made to address my
> previous review comments [1][2][3][4] were OK.
>
> Not all changes were made quite as expected, and there were a few
> other things I noticed in passing.
>
> ======
>
> 1. General
>
> I previously [1] wrote a comment:
> Use consistent uppercase for JSON and DDL instead of sometimes json
> and ddl. There are quite a few random examples in the commit message
> but might be worth searching the entire patch to make all comments
> also consistent case.
>
> Now I still find a number of lowercase "json" and "ddl" strings.
>

fixed

>
> 3. Commit message
>
> Executing a non-immutable expression during the table rewrite phase is not
> allowed, as it may result in different data between publisher and subscriber.
> While some may suggest converting the rewrite inserts to updates and replicate
> them afte the ddl command to maintain data consistency. But it doesn't work if
> the replica identity column is altered in the command. This is because the
> rewrite inserts do not contain the old values and therefore cannot be converted
> to update.
>
> ~
>
> 3a.
> Grammar and typo need fixing for "While some may suggest converting
> the rewrite inserts to updates and replicate them afte the ddl command
> to maintain data consistency. But it doesn't work if the replica
> identity column is altered in the command."
>
> ~
>
> 3b.
> "rewrite inserts to updates"
> Consider using uppercase for the INSERTs and UPDATEs
>
> ~~~
>
> 4.
> LIMIT:
>
> --> LIMITATIONS: (??)
>

Fixed.

>
> ======
> contrib/test_decoding/sql/ddl.sql
>
> 5.
> +SELECT 'ddl msg2' FROM pg_logical_emit_ddl_message('ddl msg2', 16394,
> 1, '{"fmt": "CREATE SCHEMA %{if_not_exists}s %{name}I
> %{authorization}s", "name": "foo", "authorization": {"fmt":
> "AUTHORIZATION %{authorization_role}I", "present": false,
> "authorization_role": null}, "if_not_exists": ""}');
>
> Previously ([4]#1) I had asked what is the point of setting a JSON
> payload here when the JSON payload is never used. You might as well
> pass the string "banana" to achieve the same thing AFAICT. I think the
> reply [5] to the question was wrong. If this faked JSON is used for
> some good reason then there ought to be a test comment to say the
> reason. Otherwise, the fake JSON just confuses the purpose of this
> test so it should be removed/simplified.
>

added a comment explainig this

> ======
> contrib/test_decoding/test_decoding.c
>
> 6. pg_decode_ddl_message
>
> Previously ([4] #4b) I asked if it was necessary to use
> appendBinaryStringInfo, instead of just appendStringInfo. I guess it
> doesn't matter much, but I think the question was not answered.
>

Although we're using plain strings, the API supports binary. Other plugins
could do use binary.

> ======
> doc/src/sgml/catalogs.sgml
>
> 7.
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>evtisinternal</structfield> <type>char</type>
> + </para>
> + <para>
> + True if the event trigger is internally generated.
> + </para></entry>
> + </row>
>
> Why was this called a 'char' type instead of a 'bool' type?
>

fixed.

> ======
> doc/src/sgml/logical-replication.sgml
>
> 8.
> + <para>
> + For example, a CREATE TABLE command executed on the publisher gets
> + WAL-logged, and forwarded to the subscriber to replay; a subsequent "ALTER
> + SUBSCRIPTION ... REFRESH PUBLICATION" is performed on the
> subscriber database so any
> + following DML changes on the new table can be replicated.
> + </para>
>
> In my previous review comments ([2] 11b) I suggested for this to say
> "then an implicit ALTER..." instead of "a subsequent ALTER...". I
> think the "implicit" part got accidentally missed.
>

fixed.

> ~~~
>
> 9.
> + <listitem>
> + <para>
> + In <literal>ADD COLUMN ... DEFAULT</literal> clause and
> + <literal>ALTER COLUMN TYPE</literal> clause of <command>ALTER
> + TABLE</command> command, the functions and operators used in
> + expression must be immutable.
> + </para>
> + </listitem>
>
> IMO this is hard to read. It might be easier if expressed as 2
> separate bullet points.
>
> SUGGESTION
> For ALTER TABLE ... ADD COLUMN ... DEFAULT, the functions and
> operators used in expressions must be immutable.
>
> For ALTER TABLE ... ADD COLUMN TYPE, the functions and operators used
> in expressions must be immutable.
>

fixed.

> ~~~
>
> 10.
> + <para>
> + To change the column type, first add a new column of the desired
> + type, then update the new column value with the old column value,
> + and finnally drop the old column and rename the new column to the
> + old column.
> + </para>
>
> /finnally/finally/
>

fixed.

> ======
> .../access/rmgrdesc/logicalddlmsgdesc.c
>
> 11. logicalddlmsg_desc
>
> I previously wrote some suggestions about improving the Assert in this
> code (see [3]#2). But, the reply [5] "The array index is already
> length + 1 as indices start from 0." did not make sense, because I am
> not saying the code has wrong indices. I am only saying the way the
> Asserts are done was inconsistent with other similar MESSAGE msg, and
> IMO there is a more intuitive way to assert that the DDL Message has
> got some payload in it.
>

fixed.

> ======
> src/backend/catalog/pg_publication.c
>
> 12.
> pub->pubactions.pubinsert = pubform->pubinsert;
> pub->pubactions.pubupdate = pubform->pubupdate;
> pub->pubactions.pubdelete = pubform->pubdelete;
> + pub->pubactions.pubddl_table = pubform->pubddl_table;
> pub->pubactions.pubtruncate = pubform->pubtruncate;
> pub->pubviaroot = pubform->pubviaroot;
>
> IMO all the insert/update/delete/truncate belong together because they
> all came from the 'publish' parameter. I don't think pubddl_table just
> be jammed into the middle of them.
>

fixed.

> ======
> src/backend/commands/event_trigger.c
>
> 13.
> static Oid insert_event_trigger_tuple(const char *trigname, const
> char *eventname,
> - Oid evtOwner, Oid funcoid, List *taglist);
> + Oid evtOwner, Oid funcoid, List *taglist, bool isinternal);
>
> /isinternal/is_internal/
>

fixed.

> ~~~
>
> 14. CreateEventTrigger
>
> * Create an event trigger.
> */
> Oid
> -CreateEventTrigger(CreateEventTrigStmt *stmt)
> +CreateEventTrigger(CreateEventTrigStmt *stmt, bool isinternal)
>
> /isinternal/is_internal/
>

fixed.

> ~~~
>
> 15. insert_event_trigger_tuple
>
> /* Insert catalog entries. */
> return insert_event_trigger_tuple(stmt->trigname, stmt->eventname,
> - evtowner, funcoid, tags);
> + evtowner, funcoid, tags, isinternal);
>
> /isinternal/is_internal/
>
> ~~~
>
> 16.
> if (filter_event_trigger(tag, item))
> {
> - /* We must plan to fire this trigger. */
> - runlist = lappend_oid(runlist, item->fnoid);
> + static const char *trigger_func_prefix = "publication_deparse_%s";
> + char trigger_func_name[NAMEDATALEN];
> + Oid pub_funcoid;
> + List *pubfuncname;
> +
> + /* Get function oid of the publication's ddl deparse event trigger */
> + snprintf(trigger_func_name, sizeof(trigger_func_name), trigger_func_prefix,
> + eventstr);
> + pubfuncname = SystemFuncName(trigger_func_name);
> + pub_funcoid = LookupFuncName(pubfuncname, 0, NULL, true);
> +
> + if (item->fnoid != pub_funcoid)
> + runlist = lappend_oid(runlist, item->fnoid);
> + else
> + {
> + /* Only the first ddl deparse event trigger needs to be invoked */
> + if (pub_deparse_func_cnt++ == 0)
> + runlist = lappend_oid(runlist, item->fnoid);
> + }
>
> 16a.
> I somehow felt this logic would be more natural/readable if the check
> was for == pub_funcoid instead of != pub_funcoid.
>

fixed.

> ~
>
> 16b.
> Maybe use /pubfuncname/pub_funcname/ for consistent variable naming.
>
> ======
> src/backend/commands/publicationcmds.c
>
> 17. DropDDLReplicaEventTriggers
>
> +static void
> +DropDDLReplicaEventTriggers(Oid puboid)
> +{
> + DropDDLReplicaEventTrigger(PUB_TRIG_DDL_CMD_START, puboid);
> + DropDDLReplicaEventTrigger(PUB_TRIG_DDL_CMD_END, puboid);
> + DropDDLReplicaEventTrigger(PUB_TRIG_TBL_REWRITE, puboid);
> + DropDDLReplicaEventTrigger(PUB_TRIG_TBL_INIT_WRITE, puboid);
> +}
> +
> +
>
> Double blank lines.
>
>
fixed.

> ======
> src/backend/replication/logical/ddltrigger.c
>
>
> 18.
> +/*
> + * Check if the command can be publishable.
> + *
> + * XXX Executing a non-immutable expression during the table rewrite phase is
> + * not allowed, as it may result in different data between publisher and
> + * subscriber. While some may suggest converting the rewrite inserts to updates
> + * and replicate them after the ddl command to maintain data
> consistency. But it
> + * doesn't work if the replica identity column is altered in the command. This
> + * is because the rewrite inserts do not contain the old values and therefore
> + * cannot be converted to update.
> + *
> + * Apart from that, commands contain volatile functions are not
> allowed. Because
> + * it's possible the functions contain DDL/DML in which case these operations
> + * will be executed twice and cause duplicate data. In addition, we don't know
> + * whether the tables being accessed by these DDL/DML are published or not. So
> + * blindly allowing such functions can allow unintended clauses like the tables
> + * accessed in those functions may not even exist on the subscriber.
> + */
> +static void
> +check_command_publishable(ddl_deparse_context context, bool is_rewrite)
>
> 18a.
> "can be publishable" --> "can be published"
>
> ~
>
> 18b.
> While some may suggest converting the rewrite inserts to updates and
> replicate them after the ddl command to maintain data consistency. But
> it doesn't work if the replica identity column is altered in the
> command.
>
> Grammar? Why is this split into 2 sentences?
>

fixed.

> ~
>
> 18c.
> Apart from that, commands contain volatile functions are not allowed.
>
> /contain/containing/
>

fixed.

> ~~~
>
> 19. check_command_publishable
>
> + if (context.func_volatile == PROVOLATILE_VOLATILE)
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot use volatile function in ALTER TABLE command because
> it cannot be replicated in DDL replication"));
> +}
>
> Is it correct for this message to name ALTER TABLE when that is not
> even part of the check? Is that the only scenario where this is
> possible to occur?
>

fixed.

> ~~~
>
> 20. publication_deparse_ddl_command_end
>
> + /* handle drop commands which appear in the SQLDropList */
> + slist_foreach(iter, &(currentEventTriggerState->SQLDropList))
> + {
> + SQLDropObject *obj;
> + EventTriggerData *trigdata;
> + char *command;
> + DeparsedCommandType cmdtype;
> +
> + trigdata = (EventTriggerData *) fcinfo->context;
> +
> + obj = slist_container(SQLDropObject, next, iter.cur);
> +
> + if (!obj->original)
> + continue;
> +
> + if (strcmp(obj->objecttype, "table") == 0)
> + cmdtype = DCT_TableDropEnd;
> + else
> + continue;
> +
> + command = deparse_drop_command(obj->objidentity, obj->objecttype,
> + trigdata->parsetree);
> +
> + if (command)
> + LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
> + command, strlen(command) + 1);
> + }
>
>
> 20a.
> Uppercase the comment.
>

fixed.

> ~
>
> 20b.
> Also, it's not a very good comment -- because not giving any more
> information than the line of code; can you give a more detailed
> explanation?
>

fixed.

> ~
>
> 20c.
> The way the continues are arranged seems a bit strange. Since this is
> all DROP code wouldn't it make more sense to write it like this:
>
> BEFORE
> + if (strcmp(obj->objecttype, "table") == 0)
> + cmdtype = DCT_TableDropEnd;
> + else
> + continue;
> +
> + command = deparse_drop_command(obj->objidentity, obj->objecttype,
> + trigdata->parsetree);
> +
> + if (command)
> + LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
> + command, strlen(command) + 1);
>
> AFTER
> if (strcmp(obj->objecttype, "table") == 0)
> {
> DeparsedCommandType cmdtype = DCT_TableDropEnd;
> char *command;
>
> command = deparse_drop_command(obj->objidentity, obj->objecttype,
> trigdata->parsetree);
> if (command)
> LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
> command, strlen(command) + 1);
> }
>

fixed.

> ~~~~
>
> 21. publication_deparse_table_init_write
>
> + }
> + return PointerGetDatum(NULL);
>
> Add a blank line before the return;
>
>

fixed.

>
>
> ======
> .../replication/logical/reorderbuffer.c
>
> 22. ReorderBufferRestoreChange
>
> + /* read prefix */
> + memcpy(&prefix_size, data, sizeof(Size));
> + data += sizeof(Size);
> + memcpy(&change->data.ddl.relid, data, sizeof(Oid));
> + data += sizeof(Oid);
> + memcpy(&change->data.ddl.cmdtype, data, sizeof(DeparsedCommandType));
> + data += sizeof(int);
> + change->data.ddl.prefix = MemoryContextAlloc(rb->context, prefix_size);
> + memcpy(change->data.ddl.prefix, data, prefix_size);
> + Assert(change->data.ddl.prefix[prefix_size - 1] == '\0');
> + data += prefix_size;
>
> I had suggested before ([3] #23) that it would be better to use:
> data += sizeof(DeparsedCommandType);
>
> instead of:
> data += sizeof(int);
>
> You already changed this OK in another place but this instance got
> accidentally missed.
>

fixed.

> ======
> src/backend/replication/logical/worker.c
>
> 23. preprocess_create_table
>
> + if (castmt->objtype == OBJECT_TABLE)
> + {
> + /*
> + * Force skipping data population to avoid data
> + * inconsistency. Data should be replicated from the
> + * publisher instead.
> + */
> + castmt->into->skipData = true;
> + }
>
> I had suggested before ([4] #16b) that the "Force skipping" comments
> are not necessary because the function header comment already says the
> same thing. One of the "Force skipping" comments was removed OK, but
> there is still one more remaining that should be removed.
>

fixed.

> ~~~
>
> 24. postprocess_ddl_create_table
>
> + commandTag = CreateCommandTag((Node *) command);
> + cstmt = (CreateStmt *) command->stmt;
> + rv = cstmt->relation;
> +
> + if (commandTag != CMDTAG_CREATE_TABLE)
> + return;
> +
> + cstmt = (CreateStmt *) command->stmt;
> + rv = cstmt->relation;
> + if (!rv)
> + return;
>
> This code is still flawed as previously described (see [4]#18). There
> are duplicate assignments of 'cstmt' and 'rv'.
>

fixed.

> ~~~
>
> 25. apply_handle_ddl
>
> +/*
> + * Handle DDL replication messages. Convert the json string into a query
> + * string and run it through the query portal.
> + */
> +static void
> +apply_handle_ddl(StringInfo s)
>
> IMO for consistency this should use the same style as the other
> function comments. So after the first sentence, put a more detailed
> description after a blank line.
>

fixed.

> ~~~
>
> 26. apply_handle_ddl
>
> I previously ([4]#21) asked a questio:
> There seems to be an assumption here that the only kind of command
> processed here would be TABLE related. Maybe that is currently true,
> but shouldn't there be some error checking just to make sure it cannot
> execute unexpected commands?
>
> ~
>
> IMO this question remains relevant -- I think this ddl code needs some
> kind of additional guards/checks in it otherwise it will attempt to
> deparse commands that it does not understand (e.g. imagine a later
> version publisher which supports more DDL than the subscriber does).
>

Currently, according to the design, there is no distinction between
what publisher supports
and subscriber supports. Only the publication decides what is
replicated and not, subscription has no control.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 27. PGOutputTxnData
>
> typedef struct PGOutputTxnData
> {
> bool sent_begin_txn; /* flag indicating whether BEGIN has been sent */
> + List *deleted_relids; /* maintain list of deleted table oids */
> } PGOutputTxnData;
>
> Actually, from my previous review (see [4]#22) I meant for this to be
> a more detailed *structure* level comment to say why this is necessary
> even to have this member; not just a basic field comment like what has
> been added.
>

fixed.

> ~~~
>
> 28. is_object_published
>
> + /*
> + * Only send this ddl if we don't publish ddl message or the ddl
> + * need to be published via its root relation.
> + */
> + if (relentry->pubactions.pubddl_table &&
> + relentry->publish_as_relid == objid)
> + return true;
>
> The comment seems wrong/confused – "Only send this ddl if we don't
> publish ddl message" (??)
>

fixed.

> ======
> src/bin/pg_dump/pg_dump.c
>
> 29. getEventTriggers
>
> + /* skip internally created event triggers by checking evtisinternal */
> appendPQExpBufferStr(query,
> "SELECT e.tableoid, e.oid, evtname, evtenabled, "
> "evtevent, evtowner, "
>
> Uppercase the comment.
>

fixed.

> ======
> src/include/catalog/pg_event_trigger.h
>
> 33.
> @@ -36,7 +36,7 @@ CATALOG(pg_event_trigger,3466,EventTriggerRelationId)
> * called */
> char evtenabled; /* trigger's firing configuration WRT
> * session_replication_role */
> -
> + bool evtisinternal; /* trigger is system-generated */
> #ifdef CATALOG_VARLEN
> text evttags[1]; /* command TAGs this event trigger targets */
> #endif
>
> ~
>
> This change should not remove the blank line that previously existed
> before the #ifdef CATALOG_VARLEN.
>

fixed.

> ======
> src/include/catalog/pg_publication.
>
> 34.
> +/* Publication trigger events */
> +#define PUB_TRIG_DDL_CMD_START "ddl_command_start"
> +#define PUB_TRIG_DDL_CMD_END "ddl_command_end"
> +#define PUB_TRIG_TBL_REWRITE "table_rewrite"
> +#define PUB_TRIG_TBL_INIT_WRITE "table_init_write"
>
> Elsewhere in PG15 code there are already hardcoded literal strings for
> these triggers, so I am wondering if these constants should really be
> defined in some common place where everybody can make use of them
> instead of having a mixture of string literals and macros for the same
> strings.
>

fixed.

> ======
> src/include/commands/event_trigger.h
>
> 35.
> -extern Oid CreateEventTrigger(CreateEventTrigStmt *stmt);
> +extern Oid CreateEventTrigger(CreateEventTrigStmt *stmt, bool isinternal);
> extern Oid get_event_trigger_oid(const char *trigname, bool missing_ok);
>
> IMO a better name is 'is_internal' (Using a snake-case name matches
> like the other 'missing_ok')
>

fixed.

> ======
> src/include/replication/ddlmessage.h
>
> 36.
> + * Copyright (c) 2022, PostgreSQL Global Development Group
>
> Copyright for the new file should be 2023?
>

fixed.

> ======
> src/include/tcop/ddldeparse.h
>
> 37.
> * ddldeparse.h
> *
> * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
> * Portions Copyright (c) 1994, Regents of the University of California
> *
> * src/include/tcop/ddldeparse.h
>
> ~
>
> I think this is a new file for the feature so why is the copyright
> talking about old dates like 1994,1996 etc?
>

fixed.

regards,
Ajin Cherian

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Wen Yi 2023-06-09 09:55:23 Why lex & yacc think this is a syntax error?
Previous Message Mathieu Poussin 2023-06-09 09:05:23 Logical replication slots stuck in catchup on a very large table

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-06-09 10:11:30 Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN
Previous Message Masahiro Ikeda 2023-06-09 09:20:09 Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN