Re: Support logical replication of DDLs

From: Zheng Li <zhengli10(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject: Re: Support logical replication of DDLs
Date: 2022-03-24 17:54:39
Message-ID: CAAD30UJ8v7RAxfLLx-nVO9COKNeBiKVzcwu1LdCe5dZJU2bUDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Dilip,

Thanks for the feedback.

> > > > The table creation WAL and table insert WAL are available. The tricky
> > > > part is how do we break down this command into two parts (a normal
> > > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > > WALs. I’ll have to dig more on this.

> > I had put some more thought about this, basically, during CTAS we are
> > generating the CreateStmt inside "create_ctas_internal" and executing
> > it first before inserting the tuple, so can't we generate the
> > independent sql just for creating the tuple maybe using deparsing or
> > something?

Yes, deparsing might help for edge cases like this. However I found
a simple solution for this specific case:

The idea is to force skipping any direct data population (which can
potentially cause data inconsistency on the subscriber)
in CREATE AS and SELECT INTO command on the subscriber by forcing the
skipData flag in the intoClause of the parsetree after
the logical replication worker parses the command. The data sync will
be taken care of by the DML replication after the DDL replication
finishes.

This is implemented in the latest commit:
https://github.com/zli236/postgres/commit/116c33451da8d44577b8d6fdb05c4b6998cd0167

> > Apart from that I have one more question, basically if you are
> > directly logging the sql query then how you are identifying under
> > which schema you need to create that table, are you changing the sql
> > and generating schema-qualified name?
>
> I was going through the patch and it seems you are logging the search
> path as well along with the query so I think this will probably work.

Yes, currently we log the search path as well as the user name. And we
enforce the same search path and user name when applying the DDL command
on the subscriber.

> I have got one more query while looking into the code. In the below
> code snippet you are logging DDL command only if it is a top level
> query but there are no comments explaining what sort of queries we
> don't want to log. Suppose I am executing a DDL statement inside a PL
> then that will not be a top level statement so is your intention to
> block that as well or that is an unintentional side effect?
>
> + /*
> + * Consider logging the DDL command if logical logging is
> enabled and this is
> + * a top level query.
> + */
> + if (XLogLogicalInfoActive() && isTopLevel)
> + LogLogicalDDLCommand(parsetree, queryString);

Good catch. The reason for having isTopLevel in the condition is
because I haven't decided if a DDL statement inside a PL should
be replicated from the user point of view. For example, if I execute a
plpgsql function or a stored procedure which creates a table under the hood,
does it always make sense to replicate the DDL without running the same
function or stored procedure on the subscriber? It probably depends on
the specific
use case. Maybe we can consider making this behavior configurable by the user.

Thanks,
Zheng

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Philip Semanchuk 2022-03-25 15:32:24 Leading comments and client applications
Previous Message Tom Lane 2022-03-24 15:59:07 Re: Get primary key of a table through SPI

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-24 17:55:29 Re: [PATCH] Expose port->authn_id to extensions and triggers
Previous Message Jacob Champion 2022-03-24 17:44:06 Re: [PATCH] Expose port->authn_id to extensions and triggers