Re: DDL deparse

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: DDL deparse
Date: 2026-06-24 17:09:10
Message-ID: CAD21AoC0CVz3wM+h8S-xtqRoDNVqJa-sVVt71yNU+sdT4rN=wQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2026 at 4:54 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Thu, Jun 18, 2026 at 3:24 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Hi all,
> > (CC'ing people who worked on DDL deparse)
>
> Thanks for working on this feature!
>
> Please note that I haven't spent enough time reading all the older
> threads in this area, so I'm still building my understanding. Kindly
> bear with me
>
> > DDL Deparse has been developed[1][2] for quite a long time and was
> > originally proposed as a building block of DDL replication. Reading
> > the related discussion threads, there is an agreement on implementing
> > DDL replication on top of DDL deparse, and some hackers prefer the
> > format of its output.
>
> Could you please summarize the other approaches for DDL replication,
> their pros and cons, and why the deparsing approach is preferred? I
> have a vague understanding, but summarizing it here would help
> reviewers a lot. Thank you!

To summarize the alternatives that have been discussed for carrying
DDL to the subscriber, and why deparse is preferred.

1. Replicate the raw SQL text with the search_path value. While it is
simple, the statement is ambiguous even with the search_path, as the
publisher and the subscriber might have a different set of schemas.

2. Reconstruct the DDL from catalog state using pg_get_XXX_ddl().
While it is always normalized and schema-qualified and can reuse
ruleutils.c and ddlutils.c, it loses the original intent. It also
still requires ALTER TABLE support in some form.

3. Infer the DDL from the stream of catalog changes decoded from WAL.
It yields a delta rather than a post-image, and rides on logical
decoding naturally, since the catalog changes are already in the WAL.
However, some information needed to reproduce a command exists only in
the parse tree and is never persisted in the catalog or the WAL; the
USING expression of ALTER TABLE ... ALTER COLUMN ... TYPE ... USING
(expr) is the canonical example.

Deparsing the parse tree avoids these problems: it captures the
operation itself (so ALTER stays ALTER and replicates incrementally),
schema-qualifies names, and has access to parse-tree-only information
such as the USING expression that (3) cannot recover. DDL deparse
could also be implemented as a feature in its own right, beyond DDL
replication. For a Postgres-to-Postgres solution it is natural to emit
a deparsed SQL statement as the output; as for the output format, some
senior hackers prefer the flexibility of a self-documenting JSON blob.

>
> > I've reviewed the last proposed approach and
> > researched DDL deparse/DDL replication, and there are some points that
> > are unclear to me, and things have changed since it was actively
> > developed. So I've started this thread separately from the DDL
> > replication thread in order to discuss DDL deparse itself while
> > working toward DDL replication development.
>
> +1 to splitting things for better discussion.
>
> > Quick summary of the last developed DDL deparse feature[3]: the basic
> > functionality is that it takes a parse tree as an input and constructs
> > DDLs by retrieving the information from system catalogs based on the
> > parse tree.
>
> The parse tree is what gets generated when the DDL command is being
> executed, right? I mean, we can't generate the DDL statement as an
> after-the-fact operation? I haven't studied the pg_get_table_ddl patch
> yet.

I think that capture should happen at the execution time, since the
parse tree is transient. But deparsing can be deferred; if we write
the parse tree to WAL at executiontime, the deparse can run later at
decode time. So generation can be done after the fact; only the
capture cannot.

pg_get_table_ddl() differs: its input is the catalog state, not a
parse tree, so it can run any time after the fact -- but it yields the
resulting state, not the original operation.

>
> > For instance, deparsing "create table
> > test (a int)" produces:
> >
> > {
> > "fmt": "CREATE TABLE %{identity}D (%{table_elements:, }s)",
> > "identity": {
> > "objname": "test",
> > "schemaname": "public"
> > },
> > "table_elements": [
> > {
> > "fmt": "%{name}I %{coltype}T STORAGE %{colstorage}s",
> > "name": "a",
> > "type": "column",
> > "coltype": {
> > "typmod": "",
> > "typarray": false,
> > "typename": "int4",
> > "schemaname": "pg_catalog"
> > },
> > "colstorage": "PLAIN"
> > }
> > ]
> > }
>
> Do we need an LSN+timeline, creation time, or some sort of ordering ID
> field for establishing the order of operations (e.g., DDL command1
> executed before command2)? I understand that when used for DDL
> replication it does have an LSN field, but for other DDL deparsing
> use-cases it would also be helpful.

I think it depends on in which case DDL deparse is used. For DDL
replication the ordering is already established by the LSN, so a
separate field isn't needed there. I can't immediately think of
another use case that would require an explicit ordering ID -- do you
have a specific one in mind?

>
> > The main point that I want to discuss is what output we expect from
> > DDL deparse, especially CREATE DDLs. I originally thought that DDL
> > deparse converts the parse tree back into the DDL command originally
> > executed. However, what DDL deparse for CREATE TABLE actually does is
> > to generate possibly multiple DDLs to achieve the exact same catalog
> > state. That is, the deparsed command doesn't necessarily preserve the
> > user intent in the original DDL command, and possibly generates
> > multiple commands for the one command. For instance:
> >
> > I haven't seen any discussion on whether it's architecturally correct
> > for a DDL deparser not to preserve the user intent.
>
> I briefly played with MySQL binlog replication and noticed that it
> doesn't decompose DDLs. I haven't read their documentation, but I
> believe there's a concern with decomposing a single DDL into multiple
> DDLs - how would the consumer make it atomic and crash-safe alongside
> other concurrent DDLs? Say the consumer is another PostgreSQL database
> and it wants to create a table along with indexes, FKs, constraints,
> etc. If we decompose the single DDL into multiple DDLs, how can the
> consumer replay them together atomically and be crash-safe?

I think the decomposed DDLs should be run inside a single transaction.
This is less of a problem in PostgreSQL than in MySQL because
PostgreSQL DDL is transactional, so wrapping the commands gives
all-or-nothing application and crash-safety follows from the normal
apply mechanism.

> > Feedback is very welcome.
>
> I did a quick pass over the patch. The code looks a bit lengthy - is
> it possible to split it up? For example, JSON-related helpers first,
> then basic CREATE TABLE, then sequences, then constraints, then ALTER
> TABLE, etc.

The patch was just based on the current HEAD and submitted just for
anyone interested in this feature to check the behavior. I don't think
it's ready for the review. We can refer the pre-rebased patches[1]
that are split to multiple patches.

Regards,

[1] https://www.postgresql.org/message-id/OS0PR01MB57163E6487EFF7378CB8E17C9438A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
Masahiko Sawada

Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florents Tselai 2026-06-24 17:17:00 Re: More jsonpath methods: translate, split, join
Previous Message Bharath Rupireddy 2026-06-24 17:05:08 Re: Report replica identity in pg_publication_tables