Re: Support logical replication of DDLs

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2023-07-18 05:28:08
Message-ID: CAD21AoCXCAQ5QyXu9-xs30ViUHtUxQMmf-818d8GX--5pTmZ7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Jul 11, 2023 at 8:01 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Hi,
>
> We have been researching how to create a test that detects failures resulting
> from future syntax changes, where the deparser fails to update properly.
>
> The basic idea comes from what Robert Haas suggested in [1]: when running the
> regression test(tests in parallel_schedule), we replace the executing ddl
> statement with the its deparsed version and execute the deparsed statement, so
> that we can run all the regression with the deparsed statement and can expect
> the output to be the same as the existing expected/*.out. As developers
> typically add new regression tests to test new syntax, so we expect this test
> can automatically identify most of the new syntax changes.
>
> One thing to note is that when entering the event trigger(where the deparsing
> happen), the ddl have already been executed. So we need to get the deparsed
> statement before the execution and replace the current statement with it. To
> achieve this, the current approach is to create another database with deparser
> trigger and in the ProcessUtility hook(e.g. tdeparser_ProcessUtility in the
> patch) we redirect the local statement to that remote database, then the
> statment will be deparsed and stored somewhere, we can query the remote
> database to get the deparsed statement and use it to replace the original
> statment.
>
> The process looks like:
> 1) create another database and deparser event trigger in it before running the regression test.
> 2) run the regression test (the statement will be redirect to the remote database and get the deparsed statement)
> 3) compare the output file with the expected ones.
>
> Attach the POC patch(0004) for this approach. Note that, the new test module only
> test test_setup.sql, create_index.sql, create_table.sql and alter_table.sql as
> we only support deparsing TABLE related command for now. And the current test
> cannot pass because of some bugs in deparser, we will fix these bugs soon.
>
>
> TO IMPROVE:
>
> 1. The current approach needs to handle the ERRORs, WARNINGs, and NOTICEs from
> the remote database. Currently it will directly rethrow any ERRORs encountered
> in the remote database. However, for WARNINGs and NOTICEs, we will only rethrow
> them along with the ERRORs. This is done to prevent duplicate messages in the
> output file during local statement execution, which would be inconsistent with
> the existing expected output. Note that this approach may potentially miss some
> bugs, as there could be additional WARNINGs or NOTICEs caused by the deparser
> in the remote database.
>
> 2. The variable reference and assignment (xxx /gset and select :var_name) will
> not be sent to the server(only the qualified value will be sent), but it's
> possible the variable in another session should be set to a different value, so
> this can cause inconsistent output.
>
> 3 .CREATE INDEX CONCURRENTLY will create an invalid index internally even if it
> reports an ERROR later. But since we will directly rethrow the remote ERROR in
> the main database, we won't execute the "CREATE INDEX CONCURRENTLY" in the main
> database. This means that we cannot see the invalid index in the main database.
>
> To improve the above points, another variety is: run the regression test twice.
> The first run is solely intended to collect all the deparsed statements. We can
> dump these statements from the database and then reload them in the second
> regression run. This allows us to utilize the deparsed statements to replace
> the local statements in the second regression run. This approach does not need
> to handle any remote messages and client variable stuff during execution,
> although it could take more time to finsh the test.
>

I've considered some alternative approaches but I prefer the second
approach. A long test time could not be a big problem unless we run it
by default. We can prepare a buildfarm animal that is configured to
run the DDL deparse tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2023-07-18 05:37:04 Re: Support logical replication of DDLs
Previous Message Gert Cuykens 2023-07-17 23:53:23 backup_manifest rename to backup_manifest.old after successful postgres start up

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-07-18 05:32:40 Re: Add TOAST support for more system tables
Previous Message Michael Paquier 2023-07-18 05:13:52 Re: The same 2PC data maybe recovered twice