Re: Support logical replication of DDLs

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2022-06-28 16:47:31
Message-ID: CALDaNm3rEA_zmnDMOCT7NqK4aAffhAgooLf8rXjUN=YwA8ASFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Jun 21, 2022 at 5:49 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, June 20, 2022 11:32 AM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Saturday, June 18, 2022 3:38 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > > >
> > > > >
> > > > > While I agree that the deparser is needed to handle the potential
> > > > > syntax differences between the pub/sub, I think it's only relevant
> > > > > for the use cases where only a subset of tables in the database are
> > > > > replicated. For other use cases where all tables, functions and
> > > > > other objects need to be replicated, (for example, creating a
> > > > > logical replica for major version upgrade) there won't be any syntax
> > > > > difference to handle and the schemas are supposed to match exactly
> > > > > between the pub/sub. In other words the user seeks to create an
> > > > > identical replica of the source database and the DDLs should be
> > > > > replicated as is in this case.
> > > > >
> > > >
> > > > I think even for database-level replication we can't assume that
> > > > source and target will always have the same data in which case "Create
> > > > Table As ..", "Alter Table .. " kind of statements can't be replicated
> > > > as it is because that can lead to different results.
> > > "Create Table As .." is already handled by setting the skipData flag of the
> > > statement parsetreee before replay:
> > >
> > > /*
> > > * Force skipping data population to avoid data inconsistency.
> > > * Data should be replicated from the publisher instead.
> > > */
> > > castmt->into->skipData = true;
> > >
> > > "Alter Table .. " that rewrites with volatile expressions can also be handled
> > > without any syntax change, by enabling the table rewrite replication and
> > > converting the rewrite inserts to updates. ZJ's patch introduced this solution.
> > > I've also adopted this approach in my latest patch
> > > 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
> > >
> > > > The other point
> > > > is tomorrow we can extend the database level option/syntax to exclude
> > > > a few objects (something like [1]) as well in which case we again need
> > > > to filter at the publisher level
> > >
> > > I think for such cases it's not full database replication and we could treat it as
> > > table level DDL replication, i.e. use the the deparser format.
> >
> > Hi,
> >
> > Here are some points in my mind about the two approaches discussed here.
> >
> > 1) search_patch vs schema qualify
> >
> > Again, I still think it will bring more flexibility and security by schema qualify the
> > objects in DDL command as mentioned before[1].
> >
> > Besides, a schema qualified DDL is also more appropriate for other use
> > cases(e.g. a table-level replication). As it's possible the schema is different
> > between pub/sub and it's easy to cause unexpected and undetectable failure if
> > we just log the search_path.
> >
> > It makes more sense to me to have the same style WAL log(schema qualified)
> > for
> > both database level or table level replication as it will bring more
> > flexibility.
> >
> >
> > > "Create Table As .." is already handled by setting the skipData flag of the
> > > statement parsetreee before replay:
> >
> > 2) About the handling of CREATE TABLE AS:
> >
> > I think it's not a appropriate approach to set the skipdata flag on subscriber
> > as it cannot handle EXECUTE command in CTAS.
> >
> > CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DTAAAA');
> >
> > The Prepared statement is a temporary object which we don't replicate. So if
> > you directly execute the original SQL on subscriber, even if you set skipdata
> > it will fail.
> >
> > I think it difficult to make this work as you need handle the create/drop of
> > this prepared statement. And even if we extended subscriber's code to make it
> > work, it doesn't seems like a standard and elegant approach.
> >
> >
> > > "Alter Table .. " that rewrites with volatile expressions can also be handled
> > > without any syntax change, by enabling the table rewrite replication and
> > > converting the rewrite inserts to updates. ZJ's patch introduced this solution.
> >
> > 3) About the handling of ALTER TABLE rewrite.
> >
> > The approach I proposed before is based on the event trigger + deparser
> > approach. We were able to improve that approach as we don't need to
> > replicate
> > the rewrite in many cases. For example: we don't need to replicate rewrite dml
> > if there is no volatile/mutable function. We should check and filter these case
> > at publisher (e.g. via deparser) instead of checking that at subscriber.
> >
> > Besides, as discussed, we need to give warning or error for the cases when DDL
> > contains volatile function which would be executed[2]. We should check this at
> > publisher as well(via deparser).
> >
> >
> > > I think for such cases it's not full database replication and we could treat it as
> > > table level DDL replication, i.e. use the the deparser format.
> >
> > 4) I think the point could be that we should make the WAL log format
> > extendable
> > so that we can extend it to support more useful feature(table filter/schema
> > maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
> > to extend it in the future ?
>
> Attach the new version patch set which added support for CREATE/DROP/ATER
> Sequence and CREATE/DROP Schema ddl commands which are provided by Ajin
> Cherian off list.
>
> The new version patch will also check function's volatility[1] in ALTER TABLE
> command. If any function to be executed is volatile, we report an ERROR.
> Whether WARNING is better to be used here is still under consideration.

Few comments:
1) I found a null pointer access while trying to use the ddl feature:
#0 0x000055e9deb2904b in EventTriggerTableInitWrite
(real_create=0x55e9e0eb0288, address=...) at event_trigger.c:989
#1 0x000055e9deb15745 in create_ctas_internal
(attrList=0x55e9e0eb0140, into=0x55e9e0e86710) at createas.c:154
#2 0x000055e9deb16181 in intorel_startup (self=0x55e9e0e86d00,
operation=1, typeinfo=0x55e9e0ea99d0) at createas.c:526
#3 0x000055e9debdcfdc in standard_ExecutorRun
(queryDesc=0x55e9e0e8f240, direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:352
#4 0x000055e9debdcf0b in ExecutorRun (queryDesc=0x55e9e0e8f240,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:307
#5 0x000055e9deb15cd2 in ExecCreateTableAs (pstate=0x55e9e0e86830,
stmt=0x55e9e0e717e8, params=0x0, queryEnv=0x0, qc=0x7fff45517190) at
createas.c:346
#6 0x000055e9dee3202a in ProcessUtilitySlow (pstate=0x55e9e0e86830,
pstmt=0x55e9e0e70b18,
queryString=0x55e9e0e483e0 "CREATE TEMP TABLE
pg_temp_5.pg_temp_24961_2 AS SELECT mv.ctid AS tid,
newdata.*::pg_temp_5.pg_temp_24961 AS newdata FROM public.sro_index_mv
mv FULL JOIN pg_temp_5.pg_temp_24961 newdata ON (newdata.c "...,
context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0,
dest=0x55e9df2e7640 <spi_printtupDR>, qc=0x7fff45517190) at
utility.c:1669
#7 0x000055e9dee30b5d in standard_ProcessUtility (pstmt=0x55e9e0e70b18,
queryString=0x55e9e0e483e0 "CREATE TEMP TABLE
pg_temp_5.pg_temp_24961_2 AS SELECT mv.ctid AS tid,
newdata.*::pg_temp_5.pg_temp_24961 AS newdata FROM public.sro_index_mv
mv FULL JOIN pg_temp_5.pg_temp_24961 newdata ON (newdata.c "...,
readOnlyTree=true, context=PROCESS_UTILITY_QUERY, params=0x0,
queryEnv=0x0, dest=0x55e9df2e7640 <spi_printtupDR>, qc=0x7fff45517190)
at utility.c:1074
#8 0x000055e9dee2fac7 in ProcessUtility (pstmt=0x55e9e0e19538,
queryString=0x55e9e0e483e0 "CREATE TEMP TABLE
pg_temp_5.pg_temp_24961_2 AS SELECT mv.ctid AS tid,
newdata.*::pg_temp_5.pg_temp_24961 AS newdata FROM public.sro_index_mv
mv FULL JOIN pg_temp_5.pg_temp_24961 newdata ON (newdata.c "...,
readOnlyTree=true, context=PROCESS_UTILITY_QUERY, params=0x0,
queryEnv=0x0, dest=0x55e9df2e7640 <spi_printtupDR>, qc=0x7fff45517190)
at utility.c:530
#9 0x000055e9dec3c5f1 in _SPI_execute_plan (plan=0x7fff45517230,
options=0x7fff45517200, snapshot=0x0, crosscheck_snapshot=0x0,
fire_triggers=true) at spi.c:2693
#10 0x000055e9dec38ea8 in SPI_execute (
src=0x55e9e0e483e0 "CREATE TEMP TABLE pg_temp_5.pg_temp_24961_2 AS
SELECT mv.ctid AS tid, newdata.*::pg_temp_5.pg_temp_24961 AS newdata
FROM public.sro_index_mv mv FULL JOIN pg_temp_5.pg_temp_24961 newdata
ON (newdata.c "..., read_only=false, tcount=0) at spi.c:618
#11 0x000055e9dec38efd in SPI_exec (
src=0x55e9e0e483e0 "CREATE TEMP TABLE pg_temp_5.pg_temp_24961_2 AS
SELECT mv.ctid AS tid, newdata.*::pg_temp_5.pg_temp_24961 AS newdata
FROM public.sro_index_mv mv FULL JOIN pg_temp_5.pg_temp_24961 newdata
ON (newdata.c "..., tcount=0) at spi.c:630
#12 0x000055e9deb5360a in refresh_by_match_merge (matviewOid=24961,
tempOid=24966, relowner=24909, save_sec_context=0) at matview.c:795
#13 0x000055e9deb528ca in ExecRefreshMatView (stmt=0x55e9e0d3e670,
queryString=0x55e9e0d3dc18 "REFRESH MATERIALIZED VIEW CONCURRENTLY
sro_index_mv;", params=0x0, qc=0x7fff45517d40) at matview.c:317

currentCommand seems to be null here:
+ /*
+ * Also do nothing if our state isn't set up, which it won't be if there
+ * weren't any relevant event triggers at the start of the current DDL
+ * command. This test might therefore seem optional, but it's
+ * *necessary*, because EventTriggerCommonSetup might find triggers that
+ * didn't exist at the time the command started.
+ */
+ if (!currentEventTriggerState)
+ return;
+
+ command = currentEventTriggerState->currentCommand;
+
+ runlist = EventTriggerCommonSetup(command->parsetree,
+
EVT_TableInitWrite,
+
"table_init_write",
+
&trigdata);

2) Missing copyright information:
diff --git a/src/include/tcop/ddl_deparse.h b/src/include/tcop/ddl_deparse.h
new file mode 100644
index 0000000..4f3c55f
--- /dev/null
+++ b/src/include/tcop/ddl_deparse.h
@@ -0,0 +1,12 @@
+#ifndef DDL_DEPARSE_H
+#define DDL_DEPARSE_H
+
+#include "tcop/deparse_utility.h"

3) This line removal is not required:
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571..8aa636c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11884,5 +11884,10 @@
proname => 'brin_minmax_multi_summary_send', provolatile => 's',
prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
prosrc => 'brin_minmax_multi_summary_send' },
-
+{ oid => '4642', descr => 'ddl deparse',
+ proname => 'ddl_deparse_to_json', prorettype => 'text',

4) We could add few comments:
+typedef enum
+{
+ SpecTypename,
+ SpecOperatorname,
+ SpecDottedName,
+ SpecString,
+ SpecNumber,
+ SpecStringLiteral,
+ SpecIdentifier,
+ SpecRole
+} convSpecifier;
+
+typedef enum
+{
+ tv_absent,
+ tv_true,
+ tv_false
+} trivalue;

5) Missing function header:
+static void fmtstr_error_callback(void *arg);
+char *ddl_deparse_json_to_string(char *jsonb);
+
+static trivalue
+find_bool_in_jsonbcontainer(JsonbContainer *container, char *keyname)
+{
+ JsonbValue key;
+ JsonbValue *value;
+ trivalue result;

6) This can be removed:
+/*
+removed feature
+ case AT_AddOidsRecurse:
+ case AT_AddOids:
+ tmp = new_objtree_VA("SET WITH OIDS", 1,
+
"type", ObjTypeString, "set with oids");
+ subcmds = lappend(subcmds,
new_object_object(tmp));
+ break;
+*/

Regards,
Vignesh

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2022-06-28 18:24:56 Re: Unique index prohibits partial aggregates
Previous Message Bos, Fred 2022-06-28 12:45:42 RE: Unique index prohibits partial aggregates

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-06-28 17:01:11 Re: making relfilenodes 56 bits
Previous Message Robert Haas 2022-06-28 15:25:55 Re: making relfilenodes 56 bits