Re: Support logical replication of DDLs

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support logical replication of DDLs
Date: 2022-06-22 05:38:36
Message-ID: CAD21AoBVCoPPRKvU_5-=wEXsa92GsNJFJOcYyXzvoSEJCx5dKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Jun 15, 2022 at 1:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> >
> > While I agree that the deparser is needed to handle the potential
> > syntax differences between
> > the pub/sub, I think it's only relevant for the use cases where only a
> > subset of tables in the database
> > are replicated. For other use cases where all tables, functions and
> > other objects need to be replicated,
> > (for example, creating a logical replica for major version upgrade)
> > there won't be any syntax difference to
> > handle and the schemas are supposed to match exactly between the
> > pub/sub. In other words the user seeks to create an identical replica
> > of the source database and the DDLs should be replicated
> > as is in this case.
> >
>
> I think even for database-level replication we can't assume that
> source and target will always have the same data in which case "Create
> Table As ..", "Alter Table .. " kind of statements can't be replicated
> as it is because that can lead to different results. The other point
> is tomorrow we can extend the database level option/syntax to exclude
> a few objects (something like [1]) as well in which case we again need
> to filter at the publisher level.

Good point.

Regarding the idea of using the parse-tree representation produced by
nodeToString(), I’ve not read the patch yet but I'm not sure it's a
good idea. A field name of a node could be changed in a major version.
If a publisher sends a parse-tree string representation to a newer
major version subscriber, the subscriber needs to be able to parse the
old format parse-tree string representation in order to reconstruct
the DDL, which reduces the maintainability much. On the other hand,
the format of deparsed json string would not be changed often.

>
> >
> So I think it's an overkill to use deparser for
> > such use cases. It also costs more space and
> > time using deparsing. For example, the following simple ALTER TABLE
> > command incurs 11 times more space
> > in the WAL record if we were to use the format from the deparser,
> > there will also be time and CPU overhead from the deparser.
> >
> ...
> >
> > So I think it's better to define DDL replication levels [1] to tailor
> > for the two different use cases. We can use different logging format
> > based on the DDL replication level. For example,
> > we can simply log the DDL query string and the search_path for
> > database level DDL replication. But for table level DDL replication we
> > need to use the deparser format in order to
> > handle the potential syntax differences and schema mapping requests.
> >
>
> I think having different logging formats is worth considering but I am
> not sure we can distinguish it for database and table level
> replication because of the reasons mentioned above. One thing which
> may need a different format is the replication of global objects like
> roles, tablespace, etc. but we haven't analyzed them in detail about
> those. I feel we may also need a different syntax altogether to
> replicate such objects. Also, I think we may want to optimize the
> current format in some cases so that the WAL amount could be reduced.
>
> I feel if we think that deparsing is required for this project then
> probably at this stage it would be a good idea to explore ways to have
> independent ways to test it. One way is to do testing via the logical
> replication of DDL (aka via this patch) and the other is to write an
> independent test suite as Sawada-San seems to be speculating above
> [2]. I am not sure if there is any progress yet on the independent
> test suite front yet.

I've attached a WIP patch for adding regression tests for DDL deparse.
The patch can be applied on
v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
submitted[1]. The basic idea is to define the event trigger to deparse
DDLs, run the regression tests, load the deparsed DDLs to another
database cluster, dump both databases, and compare the dumps. Since
the patch doesn't support deparsing all DDLs and there is a bug[2],
the attached regression test does CREATE TABLE and some ALTER TABLE
instead of running regression tests.

Regards,

[1] https://www.postgresql.org/message-id/OS0PR01MB5716B1526C2EDA66907E733B94B39%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] deparsing "ALTER INDEX tbl_idx ALTER COLUMN 2 SET STATISTICS
1000;" causes an assertion failure.

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
0001-WIP-Add-regression-tests-for-DDL-deparse.patch application/octet-stream 4.9 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthias Apitz 2022-06-22 06:39:31 Re: function currtid2() in SQL and ESQL/C to get the new CTID of a row
Previous Message Noah Misch 2022-06-22 03:37:04 Re: Extension pg_trgm, permissions and pg_dump order

Browse pgsql-hackers by date

  From Date Subject
Next Message RKN Sai Krishna 2022-06-22 05:44:34 pg_page_repair: a tool/extension to repair corrupted pages in postgres with streaming/physical replication
Previous Message Amit Langote 2022-06-22 04:38:59 Re: Replica Identity check of partition table on subscriber