Re: Support logical replication of DDLs

From: Zheng Li <zhengli10(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Runqi Tian <runqidev(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support logical replication of DDLs
Date: 2023-03-28 21:19:35
Message-ID: CAAD30U+wX_acXs_xKJFDExUVE25xQrQPaXCxjYgb2OVG92iCtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Sun, Mar 26, 2023 at 5:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I spent some time looking through this thread to try to get a sense
> of the state of things, and I came away quite depressed. The patchset
> has ballooned to over 2MB, which is a couple orders of magnitude
> larger than anyone could hope to meaningfully review from scratch.
> Despite that, it seems that there are fundamental semantics issues
> remaining, not to mention clear-and-present security dangers, not
> to mention TODO comments all over the code.

Thanks for looking into this!

> I'm also less than sold on the technical details, specifically
> the notion of "let's translate utility parse trees into JSON and
> send that down the wire". You can probably make that work for now,
> but I wonder if it will be any more robust against cross-version
> changes than just shipping the outfuncs.c representation. (Perhaps
> it can be made more robust than the raw parse trees, but I see no
> evidence that anyone's thought much about how.)

I explored the idea of using the outfuncs.c representation in [1] and
found this existing format is not designed to be portable between
different major versions. So it can't be directly used for replication
without
serious modification. I think the DDL deparser is a necessary tool if
we want to be able to handle cross-version DDL syntax differences by
providing the capability to machine-edit the JSON representation.

> And TBH, I don't think that I quite believe the premise in the
> first place. The whole point of using logical rather than physical
> replication is that the subscriber installation(s) aren't exactly like
> the publisher. Given that, how can we expect that automated DDL
> replication is going to do the right thing often enough to be a useful
> tool rather than a disastrous foot-gun? The more you expand the scope
> of what gets replicated, the worse that problem becomes --- for
> example, I don't buy for one second that "let's replicate roles"
> is a credible solution for the problems that come from the roles
> not being the same on publisher and subscriber.

I agree that a full fledged DDL deparser and DDL replication is too
big of a task for one patch. I think we may consider approaching this
feature in the following ways:
1. Phased development and testing as discussed in other emails.
Probably support table commands first (as they are the most common
DDLs), then the other commands in multiple phases.
2. Provide a subscription option to receive the DDL change, raise a
notice and to skip applying the change. The users can listen to the
DDL notice and implement application logic to apply the change if
needed. The idea is we can start gathering user feedback by providing
a somewhat useful feature (compared to doing nothing about DDLs), but
also avoid heading straight into the potential footgun situation
caused by automatically applying any mal-formatted DDLs.
3. As cross-version DDL syntax differences are expected to be uncommon
(in real workload), maybe we can think about other options to handle
such edge cases instead of fully automating it? For example, what
about letting the user specify how a DDL should be replicated on the
subscriber by explicitly providing two versions of DDL commands in
some way?

> I'm not sure how we get from here to a committable and useful feature,
> but I don't think we're close to that yet, and I'm not sure that minor
> iterations on a 2MB patchset will accomplish much.

About 1 MB of the patch are testing output files for the DDL deparser
(postgres/src/test/modules/test_ddl_deparse_regress/expected/).

Regards,
Zane

[1] https://www.postgresql.org/message-id/CAAD30U%2Boi6e6Vh_zAzhuXzkqUhagmLGD%2B_iyn2N9w_sNRKsoag%40mail.gmail.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Amit Kapila 2023-03-29 09:13:04 Re: Support logical replication of DDLs
Previous Message Adrian Klaver 2023-03-28 20:26:17 Re: How are the SELECT queries reconstructed in pg_views

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-03-28 22:02:00 Re: Add LZ4 compression in pg_dump
Previous Message Dmitry Koval 2023-03-28 20:43:45 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands