RE: Support logical replication of DDLs

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Zheng Li <zhengli10(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Support logical replication of DDLs
Date: 2022-06-15 06:45:37
Message-ID: OS0PR01MB571688CA7CAAE8B01089311294AD9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wednesday, June 15, 2022 8:14 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > Thanks for providing this idea.
> >
> > I looked at the string that is used for replication:
> >
> > """
> > {ALTERTABLESTMT :relation {RANGEVAR :schemaname public :relname foo
> > :inh true :relpersistence p :alias <> :location 12} :cmds ({ALTERTABLECMD
> > :subtype 0 :name <> :num 0 :newowner <> :def {COLUMNDEF :colname b
> > :typeName {TYPENAME :names ("public" "timestamptz") :typeOid 0 :setof
> > false :pct_type false :typmods <> :typemod -1 :arrayBounds <> :location
> > 29} :compression <> :inhcount 0 :is_local true :is_not_null false
> > :is_from_type false :storage <> :raw_default <> :cooked_default <>
> > :identity <> :identitySequence <> :generated <> :collClause <> :collOid 0
> > :constraints <> :fdwoptions <> :location 27} :behavior 0 :missing_ok
> > false}) :objtype 41 :missing_ok false}
> > """
> >
> > I think the converted parsetree string includes lots of internal
> > objects(e.g typeOid/pct_type/objtype/collOid/location/...). These are
> > unnecessary stuff for replication and we cannot make sure all the internal
> > stuff are consistent among pub/sub. So I am not sure whether replicating
> > this string is better.
> >
> > Besides, replicating the string from nodetostring() means we would need to
> > deal with the structure difference between the publisher and the
> > subscriber if any related structure has been changed which seems not good.
>
> Yeah, this existing format is not designed to be portable between different
> major versions. So it can't directly be used for replication without
> serious modification.
>
> > IMO, The advantages of the deparsing approach(as implemented in the POC
> > patch set[1]) are:
> >
> > 1) We can generate a command representation that can be
> > parsed/processed/transformed arbitrarily by the subscriber using generic
> > rules it(for example: user can easily replace the schema name in it) while
> > the results of nodetostring() seems not a standard json string, so I am
> > not sure can user reuse it without traversing the parsetree again.
> >
> > 2) With event_trigger + deparser, we can filter the unpublished objects
> > easier. For example: "DROP TABLE table_pub, table_unpub;". We can deparse
> > it into two commands "DROP TABLE table_pub" and "DROP TABLE
> table_pub" and
> > only publish the first one.
> >
> > 3) With deparser, we are able to query the catalog in the deparser to
> > build a complete command(filled with schemaname...) which user don't need
> > to do any other work for it. We don't need to force the subscriber to set
> > the same search_path as the publisher which give user more flexibility.
> >
> > 4) For CREATE TABLE AS, we can separate out the CREATE TABLE part with the
> > help of deparser and event trigger. This can avoid executing the subquery
> > on subcriber.
> >
> > 5) For ALTER TABLE command. We might want to filter out the DDL which use
> > volatile function as discussed in [2]. We can achieve this easier by
> > extending the deparser to check the functions used. We can even rebuild a
> > command without unsupported functions to replicate by using deparser.
> >
> > There may be more cases I am missing as we are still analyzing other DDLs.
>
> How does the deparser deparses CREATE FUNCTION STATEMENT? Will it
> schema qualify
> objects inside the function definition?

The current deparser doesn't schema qualify objects inside the function
source as we won't know the schema of inner objects until the function is
executed. The deparser will only schema qualify the objects around
function declaration Like:

CREATE FUNCTION [public].test_func(i [pg_catalog].int4 ) RETURNS [pg_catalog].int4 LANGUAGE plpgsql

Best regards,
Hou zj

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ian Dauncey 2022-06-15 07:31:23 PostgreSQL 12.2 on Linux ubuntu 20.4 / s390
Previous Message Amit Kapila 2022-06-15 04:00:31 Re: Support logical replication of DDLs

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-06-15 06:51:26 Re: "buffer too small" or "path too long"?
Previous Message Peter Smith 2022-06-15 06:43:35 Re: Handle infinite recursion in logical replication setup