Re: Support logical replication of DDLs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject: Re: Support logical replication of DDLs
Date: 2023-07-14 10:33:15
Message-ID: CAA4eK1KN1rFHU1wrnu90ZRVEUm9ffVnDfC8Qdh24_PSVgSsiKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Jul 11, 2023 at 4:31 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> We have been researching how to create a test that detects failures resulting
> from future syntax changes, where the deparser fails to update properly.
>
> The basic idea comes from what Robert Haas suggested in [1]: when running the
> regression test(tests in parallel_schedule), we replace the executing ddl
> statement with the its deparsed version and execute the deparsed statement, so
> that we can run all the regression with the deparsed statement and can expect
> the output to be the same as the existing expected/*.out. As developers
> typically add new regression tests to test new syntax, so we expect this test
> can automatically identify most of the new syntax changes.
>
> One thing to note is that when entering the event trigger(where the deparsing
> happen), the ddl have already been executed. So we need to get the deparsed
> statement before the execution and replace the current statement with it. To
> achieve this, the current approach is to create another database with deparser
> trigger and in the ProcessUtility hook(e.g. tdeparser_ProcessUtility in the
> patch) we redirect the local statement to that remote database, then the
> statment will be deparsed and stored somewhere, we can query the remote
> database to get the deparsed statement and use it to replace the original
> statment.
>
> The process looks like:
> 1) create another database and deparser event trigger in it before running the regression test.
> 2) run the regression test (the statement will be redirect to the remote database and get the deparsed statement)
> 3) compare the output file with the expected ones.
>
> Attach the POC patch(0004) for this approach. Note that, the new test module only
> test test_setup.sql, create_index.sql, create_table.sql and alter_table.sql as
> we only support deparsing TABLE related command for now. And the current test
> cannot pass because of some bugs in deparser, we will fix these bugs soon.
>
>
> TO IMPROVE:
>
> 1. The current approach needs to handle the ERRORs, WARNINGs, and NOTICEs from
> the remote database. Currently it will directly rethrow any ERRORs encountered
> in the remote database. However, for WARNINGs and NOTICEs, we will only rethrow
> them along with the ERRORs. This is done to prevent duplicate messages in the
> output file during local statement execution, which would be inconsistent with
> the existing expected output. Note that this approach may potentially miss some
> bugs, as there could be additional WARNINGs or NOTICEs caused by the deparser
> in the remote database.
>
> 2. The variable reference and assignment (xxx /gset and select :var_name) will
> not be sent to the server(only the qualified value will be sent), but it's
> possible the variable in another session should be set to a different value, so
> this can cause inconsistent output.
>
> 3 .CREATE INDEX CONCURRENTLY will create an invalid index internally even if it
> reports an ERROR later. But since we will directly rethrow the remote ERROR in
> the main database, we won't execute the "CREATE INDEX CONCURRENTLY" in the main
> database. This means that we cannot see the invalid index in the main database.
>
> To improve the above points, another variety is: run the regression test twice.
> The first run is solely intended to collect all the deparsed statements. We can
> dump these statements from the database and then reload them in the second
> regression run. This allows us to utilize the deparsed statements to replace
> the local statements in the second regression run. This approach does not need
> to handle any remote messages and client variable stuff during execution,
> although it could take more time to finsh the test.
>

I agree that this second approach can take more time but it would be
good to avoid special-purpose code the first approach needs. BTW, can
we try to evaluate the time difference between both approaches?
Anyway, in the first approach also, we need to run the test statement
twice.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Priyank Rajvansh 2023-07-15 06:38:26 Fwd: error in the example given for numeric data types
Previous Message Andres Freund 2023-07-13 20:50:15 Re: "PANIC: could not open critical system index 2662" - twice

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-07-14 10:40:00 Re: logical decoding and replication of sequences, take 2
Previous Message Tomas Vondra 2023-07-14 10:28:59 Re: logical decoding and replication of sequences, take 2