Re: Support logical replication of DDLs

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2023-06-19 03:39:27
Message-ID: CAJpy0uA+BThcx_k+rVNE-ASQtMt65YVoNTxBgHO4aun2qUnFeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

As per suggestion by Amit, reviewed two more formats to be used for
DDL's WAL-logging purpose, analysis below:

NodeToString:
I do not think it is a good idea to use NodeToString in DDL Rep for
reasons below:
1) It consists of too much internal and not-needed information.
2) Too large to be logged in WAL. Optimization of this will be a
mammoth task because a) we need to filter out information, not all the
info will be useful to the subscriber; b) it is not a simple key based
search and replace/remove. Modifying strings is always difficult.
3) It's not compatible across major versions. If we want to use Node
information in any format, we need to ensure that the output can be
read in a different major version of PostgreSQL.

Sql-ddl-to-json-schema given in [1]:
This was suggested by Swada-san in one of the discussions. Since it is
json format and thus essentially has to be key:value pairs like the
current implementation. The difference here is that there is no
"format string" maintained with each json object. And thus the
awareness on how to construct the DDL (i.e. exact DDL-synatx) needs to
be present at the receiver side. In our case, we maintain the "fmt"
string using which the receiver can re-construct DDL statements
without knowing PostgreSQL's DDL syntax. The "fmt" string tells us the
order of elements/keys and also representation of values (string,
identifier etc) while the JSON data created by sql-ddl-to-json-schema
looks more generic; the receiver can construct a DDL statement in any
form. It would be more useful for example when the receiver is not a
PostgreSQL server. And thus does not seem a suitable choice for the
logical replication use-case in hand.

[1]: https://www.npmjs.com/package/sql-ddl-to-json-schema

thanks
Shveta

In response to

Browse pgsql-general by date

  From Date Subject
Next Message jian he 2023-06-19 03:52:52 timetz need more bytes than timestamptz
Previous Message James Cloos 2023-06-18 06:06:19 deb’s pg_upgradecluster(1) vs streaming replication

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-06-19 05:24:44 Re: [BUG] recovery of prepared transactions during promotion can fail
Previous Message Amit Kapila 2023-06-19 03:37:46 Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases