Re: Support logical replication of DDLs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2022-06-10 06:26:57
Message-ID: CAA4eK1JVynFsj+mcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, Jun 9, 2022 at 5:14 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Hi,
>
> I did some research for one potential problem[1] mentioned earlier which is related
> to the function execution when replicating DDL.
>
> [1]> 4. Statements that have nondeterministic side effects (e.g., as caused
> > by triggers, stored procedures, user-defined functions) may result in
> > different side effects occurring on each subscriber.
> >
> > Think how to handle triggers and functions with same name but different
> > purpose.
> >
>
> Examples:
> ALTER TABLE ADD CONSTRAINT func()
> ALTER TABLE ADD COLUMN DEFAULT func()
> CREATE TRIGGER ... execute procedure func()
> ...
>
> When replication the above DDLs, there are some cases we need to think about.
>
> ----------------
> 1) The functions used in DDL have same definition among pub/sub.
>
> In this case, if the functions include DML/DDL operations, it will result in
> unexpected behavior.
>
> For example:
>
> --- both pub and sub have the same function test_func().
> create function test_func()
> returns integer as '
> begin
> CREATE TABLE test(a int);
> INSERT INTO test values(1);
> return 0;
> end
> ' language plpgsql;
>
> --- We replicate the following DDL
> ALTER TABLE t ADD COLUMN a int DEFAULT test_func();
>
> There are three SQLs that would be replicated to the subscriber:
> CREATE TABLE test(a int);
> INSERT(1);
> ALTER TABLE t ADD COLUMN a int DEFAULT test_func();
>
> Then, we would create the table "test" twice and insert same value twice(first by
> executing DEFAULT function, second by replaying the CREATE TABLE and INSERT
> command) on subcriber.
>

The other kind of problem is that we don't know whether the tables
being accessed by these DDL/DML are published or not. So blindly
allowing such functions can allow unintended clauses like the tables
accessed in those functions may not even exist on the subscriber-side.

> One possible idea is that we only allow replicating 'immutable/stable' function
> which won't include write action so that we don't have this problem. But it
> seems a bit restrictive.
>
> ----------------
>
> 2) The functions used in DDL have different definitions among pub/sub.
>
> I am not sure how to handle this case as this could result in unpredictable
> behavior based on the different definitions. And it's difficult to compare the
> function definition among pub/sub because the function could call other
> functions nested.
>
> OTOH, the current behavior of this case on HEAD is that we don't check the
> consistency of the functions executed on pub/sub. For example: the row triggers
> will always be fired on subscriber(when enabled) without checking if the same
> trigger exists on publisher. So, it might be acceptable if we document this
> restriction, although I am not sure.
>
> *******************
>
> - About the solution for the above two points. I think one solution could be:
>
> We can document that user should make sure the DDL to be replicated should not
> execute any function which could execute DML/DDL. This seems acceptable as it's
> a pretty rare case to execute DML/DDL in a CONSTRAINT function or DEFAULT
> function. And the document[1] already suggest similar thing for CONSTRAINT
> function. Besides, we can also document that and the functions should be
> defined in a consistent way among pub/sub.
>
> [1] https://www.postgresql.org/docs/devel/ddl-constraints.html
> > PostgreSQL assumes that CHECK constraints' conditions are immutable, that is,
> > they will always give the same result for the same input row. This assumption
> > is what justifies examining CHECK constraints only when rows are inserted or
> > updated, and not at other times. (The warning above about not referencing
> > other table data is really a special case of this restriction.)
>
> - Another solution could be:
>
> We coud introduce a new function flag called (replica_safety) and the values
> could be 'safe'/'unsafe'. 'safe' indicates that it's safe to be replicated to
> subscriber and it's safe to be executed when replay the DDL from other
> publisher.
>

Yeah, something like this could be a good idea but I think for the
first version we should simply raise a WARNING for such DDLs
indicating that they won't be replicated.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2022-06-10 12:57:53 How to drop a subscription inside a stored procedure?
Previous Message Steve Midgley 2022-06-09 23:25:27 Re: A function to find errors in groups in a table

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-06-10 06:33:36 Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Previous Message Kyotaro Horiguchi 2022-06-10 06:25:44 Re: Using PQexecQuery in pipeline mode produces unexpected Close messages