RE: Support logical replication of DDLs

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Zheng Li <zhengli10(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Support logical replication of DDLs
Date: 2022-06-13 10:08:16
Message-ID: OS0PR01MB571684CBF660D05B63B4412C94AB9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Sunday, June 12, 2022 2:46 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > > > > > I've not looked at these patches in-depth yet but with this
> > > > > > approach, what do you think we can handle the DDL syntax
> > > > > > differences between major versions? DDL syntax or behavior
> > > > > > could be changed by future changes and I think we need to
> > > > > > somehow deal with the differences. For
> > > > >
> > > > > > example, if the user uses logical replication for major
> > > > > > version upgrade, the publisher is older than the subscriber.
> > > > > > We might have to rewrite the DDL before applying to the
> > > > > > subscriber because the DDL executed on the publisher no longer
> > > > > > work on a new PostgreSQL version
> > > > >
> > > > > I don't think we will allow this kind of situation to happen in
> > > > > the first place for backward compatibility. If a DDL no longer
> > > > > works on a new version of PostgreSQL, the user will have to
> > > > > change the application code as well.
> > > > > So even if it happens for
> > > > > whatever reason, we could either 1. fail the apply worker and
> > > > > let the user fix such DDL because they'll have to fix the
> > > > > application code anyway when this happens.
> > > > > 2. add guard rail logic in the apply worker to automatically fix
> > > > > such DDL if possible, knowing the version of the source and
> > > > > target. Similar logic must have been implemented for
> pg_dump/restore/upgrade.
> > > > >
> > > > > > or we might have to add some options to the DDL before the
> > > > > > application in order to keep the same behavior. This seems to
> > > > > > require a different solution from what the patch does for the
> > > > > > problem you mentioned such
> > > > >
> > > > > > as "DDL involving multiple tables where only some tables are
> > > > > > replicated”.
> > > > >
> > > > > First of all, this case can only happen when the customer
> > > > > chooses to only replicate a subset of the tables in a database
> > > > > in which case table level DDL replication is chosen instead of
> > > > > database level DDL replication (where all tables and DDLs are
> > > > > replicated). I think the solution would be:
> > > > > 1. make best effort to detect such DDLs on the publisher and
> > > > > avoid logging of such DDLs in table level DDL replication.
> > > > > 2. apply worker will fail to replay such command due to missing
> > > > > objects if such DDLs didn't get filtered on the publisher for
> > > > > some reason. This should be rare and I think it's OK even if it
> > > > > happens, we'll find out why and fix it.
> > > > >
> > > >
> > > > FWIW, both these cases could be handled with the deparsing
> > > > approach, and the handling related to the drop of multiple tables
> > > > where only a few are published is already done in the last POC
> > > > patch shared by Ajin [1].
> > > >
> > >
> > > Right. So I'm inclined to think that deparsing approach is better
> > > from this point as well as the point mentioned by Álvaro before[1].
> >
> > I agree. One more point about deparsing approach is that it can also
> > help to replicate CREATE TABLE AS/SELECT INTO in a better way.
> >
> > The main idea of replicating the CREATE TABLE AS is that we deprase
> > the CREATE TABLE AS into a simple CREATE TABLE(without subquery)
> > command and WAL log it after creating the table and before writing
> > data into the table and replicate the incoming writes later as normal
> > INSERTs. In this apporach, we don't execute the subquery on subscriber
> > so that don't need to make sure all the objects referenced in the
> > subquery also exists in subscriber. And This approach works for all
> > kind of commands(e.g. CRAETE TABLE AS [SELECT][EXECUTE][VALUES])
> >
> > One problem of this approach is that we cannot use the current trigger
> > to deparse or WAL log the CREATE TABLE. Because none of the even
> > trigger is fired after creating the table and before inserting the
> > data. To solve this, one idea is that we could directly add some code
> > at the end of create_ctas_internal() to deparse and WAL log it.
> > Moreover, we could even introduce a new type of event
> > trigger(table_create) which would be fired at the expected timing so
> > that we can use the trigger function to deparse and WAL log. I am not
> > sure which way is better. I temporarily use the second idea which
> > introduce a new type event trigger in the 0003 POC patch.
>
> Hi, I agree that an intermediate structured format (with all objects schema
> qualified) makes it easier to handle syntax differences between the publisher
> and the subscriber. Such structured format is also likely easier to use by other
> logical replication consumers.
> However, to make things more maintainable, would it be better to use the
> existing serialization/deserialization functions in out/readfuncs.c to generate
> the parsetree representation of the DDL command?
>
> It turns out support for DDL commands are mostly missing in out/readfuncs at
> the moment. see the following comment from
> outfuncs.c:
>
> * Output support for raw parsetrees
> * is somewhat incomplete, too; in particular, utility statements are
> * almost entirely unsupported. We try to support everything that can
> * appear in a raw SELECT, though.
>
> So what about adding support for utility statements in out/readfuncs.c so that
> nodeToString()/stringToNode() works on raw parsetrees of utility statements? I
> think there'll be some benefits:
> 1. It's less code compared to introducing brand new
> serialization/deserialization code for DDL commands.
> 2. It's more maintainable since hackers are already familiar with
> out/readfuncs.c.
> 3. Support for utility statements in out/readfuncs.c may turn out to be useful
> for future features that need serialization/deserialization of DDL raw
> parsetrees.
>
> In patch 0012 I explored the above idea by changing the logging format of
> logical ddlmessage to that of the serialization of the raw parsetree of the DDL
> command using nodeToString(), the apply worker deserializes the message
> back to the raw parsetree by calling stringToNode().
> Here are the detailed changes:
> - Adding serialization/deserialization functions in outfuncs.c/readfuncs.c
> for CreateTableStmt, AlterTableStmt, DropStmt, CreateFunctionStmt and
> AlterFunctionStmt.
> - Modified the serialization process to always schema qualify object names,
> this is done by outQualifiedName() and a change in _outRangeVar().
> - Change the input of LogLogicalDDLMessage() to use
> nodeToString(parsetree).
> - Change the apply worker to call stringToNode(ddlmessage) to get the
> raw parsetree back and then directly execute the parsetree.
>
> This patch doesn't introduce any deparsing functionality yet, but it also
> provides the flexibility to edit the command being replayed by directly
> modifying the raw parsetree on the apply worker (e.g. setting the missing_ok
> flag for DropStmt is equivalent to adding "IF EXIST" to the statement).
>
> Thoughts?

Hi,

Thanks for providing this idea.

I looked at the string that is used for replication:

"""
{ALTERTABLESTMT :relation {RANGEVAR :schemaname public :relname foo
:inh true :relpersistence p :alias <> :location 12} :cmds ({ALTERTABLECMD
:subtype 0 :name <> :num 0 :newowner <> :def {COLUMNDEF :colname b
:typeName {TYPENAME :names ("public" "timestamptz") :typeOid 0 :setof
false :pct_type false :typmods <> :typemod -1 :arrayBounds <> :location
29} :compression <> :inhcount 0 :is_local true :is_not_null false
:is_from_type false :storage <> :raw_default <> :cooked_default <>
:identity <> :identitySequence <> :generated <> :collClause <> :collOid 0
:constraints <> :fdwoptions <> :location 27} :behavior 0 :missing_ok
false}) :objtype 41 :missing_ok false}
"""

I think the converted parsetree string includes lots of internal
objects(e.g typeOid/pct_type/objtype/collOid/location/...). These are
unnecessary stuff for replication and we cannot make sure all the internal
stuff are consistent among pub/sub. So I am not sure whether replicating
this string is better.

Besides, replicating the string from nodetostring() means we would need to
deal with the structure difference between the publisher and the
subscriber if any related structure has been changed which seems not good.

IMO, The advantages of the deparsing approach(as implemented in the POC
patch set[1]) are:

1) We can generate a command representation that can be
parsed/processed/transformed arbitrarily by the subscriber using generic
rules it(for example: user can easily replace the schema name in it) while
the results of nodetostring() seems not a standard json string, so I am
not sure can user reuse it without traversing the parsetree again.

2) With event_trigger + deparser, we can filter the unpublished objects
easier. For example: "DROP TABLE table_pub, table_unpub;". We can deparse
it into two commands "DROP TABLE table_pub" and "DROP TABLE table_pub" and
only publish the first one.

3) With deparser, we are able to query the catalog in the deparser to
build a complete command(filled with schemaname...) which user don't need
to do any other work for it. We don't need to force the subscriber to set
the same search_path as the publisher which give user more flexibility.

4) For CREATE TABLE AS, we can separate out the CREATE TABLE part with the
help of deparser and event trigger. This can avoid executing the subquery
on subcriber.

5) For ALTER TABLE command. We might want to filter out the DDL which use
volatile function as discussed in [2]. We can achieve this easier by
extending the deparser to check the functions used. We can even rebuild a
command without unsupported functions to replicate by using deparser.

There may be more cases I am missing as we are still analyzing other DDLs.

And most of these ideas has been implemented in the POC patch[1].

The overhead to maintain the deparser is not a problem as long as we have
a good test mechanism to verify that all DDLs are supported.

Overall, the event_trigger + deparsing approach looks better to me.

[1] https://www.postgresql.org/message-id/OS0PR01MB5716733D6C79D2198E5773CC94A79%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] https://www.postgresql.org/message-id/CAA4eK1JVynFsj%2BmcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ%40mail.gmail.com

Best regards,
Hou zj

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bryn Llewellyn 2022-06-13 19:01:55 Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"
Previous Message Shubham Mittal 2022-06-13 08:21:16 Re: Need optimization in query

Browse pgsql-hackers by date

  From Date Subject
Next Message r.takahashi_2@fujitsu.com 2022-06-13 11:02:40 RE: Multi-Master Logical Replication
Previous Message Amit Kapila 2022-06-13 09:14:18 Re: Replica Identity check of partition table on subscriber