Re: Add pg_get_publication_ddl function

From: Álvaro Herrera <alvherre(at)kurilemu(dot)de>
To: "Jonathan Gonzalez V(dot)" <jonathan(dot)abdiel(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add pg_get_publication_ddl function
Date: 2026-06-09 21:06:28
Message-ID: aihkCoNg3evwHYUj@alvherre.pgsql
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-Jun-09, Jonathan Gonzalez V. wrote:

> I'm attaching the v3 of this patch. I've added a new set of regress test
> that execute the query generated by pg_get_publication_ddl(), this
> helped to find a couple of issues with EXCEPT and WHERE.

That's good; however ...

> doc/src/sgml/func/func-info.sgml | 32 +
> src/backend/utils/adt/ddlutils.c | 381 ++++++++-
> src/include/catalog/pg_proc.dat | 16 +
> src/test/regress/expected/publication_ddl.out | 747 ++++++++++++++++++
> src/test/regress/parallel_schedule | 5 +-
> src/test/regress/sql/publication_ddl.sql | 239 ++++++

I think this patch file is a bit too large, and the problem might be
that you added too many tests. I would suggest to trim them down,
keeping an eye on the coverage report so that you don't lose anything
important. You don't need to have the function dump every single
publication both by OID and by name, for instance. Just test that it
works by OID once, and then the rest of the tests (just 3 or 4 to test
the features that you care about, not dozens of them) can do it by name
only.

> +static List *
> +pg_get_publication_ddl_internal(Oid puboid, bool pretty, bool no_owner)
> +{
> + Publication *pub;
> + StringInfoData buf;
> + List *statements = NIL;
> + List *pub_incl_relids = NIL;
> + List *pub_excl_relids = NIL;
> + List *pub_schemas = NIL;
> + bool first_perm = true;

I think initializing this first_perm variable so far from the place it's
used is error-prone. I would remove the initialization here and move it
down, just before pubactions.pubinsert is tested. Also, the name makes
no sense considering what it's used for.

> + if (pub->alltables)
> + {
> + appendStringInfoString(&buf, "ALL TABLES");
> + pub_excl_relids = GetExcludedPublicationTables(
> + pub->oid, PUBLICATION_PART_ROOT);

Newline after parenthesis looks ugly.

> + if (pub_excl_relids != NIL)
> + {
> + ListCell *excl_cell;
> + char *schemaname = NULL;
> +
> + appendStringInfoString(&buf, " EXCEPT (TABLE ");
> +
> + foreach(excl_cell, pub_excl_relids)
> + {
> + HeapTuple tp = SearchSysCache1(RELOID, ObjectIdGetDatum(excl_cell->oid_value));
> + Form_pg_class reltup;
> +
> + if (!HeapTupleIsValid(tp))
> + elog(ERROR, "cache lookup failed for relation %u", excl_cell->oid_value);
> +
> + reltup = (Form_pg_class) GETSTRUCT(tp);
> + schemaname = get_namespace_name(reltup->relnamespace);
> +
> + appendStringInfo(&buf, "%s%s",
> + foreach_current_index(excl_cell) > 0 ? ", " : "",
> + quote_qualified_identifier(schemaname, NameStr(reltup->relname)));

I wonder if we'll regret this syntax for the EXCEPT clause. (Unrelated
to your patch.)

> + pfree(schemaname);
> + ReleaseSysCache(tp);
> + }

There's a lot of pfreeing going on all over this code, but I'm not sure
it's good style. There are also lots of leaked allocations, and
randomly freeing some of them but not others is distracting. I think
you could probably remove all the pfreeing, and just rely on the memory
context from the caller releasing the memory at once when the function
is finished.

> + /*
> + * Publication can have table relations
> + */
> + if (pub_incl_relids != NIL)
> + {

I don't think this style of comment is very useful. I would write
something like "if there are table relations in this publication,
produce a FOR TABLE clause to list them" or something like that. I
mean, your comment is stating some fact (you could as well say "the sun
is hot as is my babe" or whatever), rather than what are you going to do
about it. In the code comments we care more about what you do and why,
rather than truism or simple fact.

> + columns = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
> + Anum_pg_publication_rel_prattrs,
> + &cols_nulls);
> +
> + rowfilter = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
> + Anum_pg_publication_rel_prqual,
> + &condition_nulls);
> + /* If non-null, we have a list of columns to publish */
> + if (!cols_nulls)
> + {

Here it looks like you have two variables just for the heck of it. I
mean, you could do simply "datum = SysCacheGetAttr(..., prattrs, &isnull)" and
immediately after do "if (!isnull) then add the colum list clause", and
afterwards do the syscache lookup for prqual and "if (!isnull) then add
the WHERE clause"; the code is more straightforward that way and you
avoid additional variables with names that are pointlessly specific.

> + /* If we have schemas, they will go right before the EXCEPT and/or WITH */
> + if (pub_schemas != NIL)
> + {
> + ListCell *schema_cell;
> +
> + /*
> + * Schemas can be preceded by a list of tables. When they are, the
> + * "TABLES IN SCHEMA" stays inline as a continuation of the existing
> + * FOR clause; otherwise it starts the FOR clause on its own line in
> + * pretty mode.
> + */
> + if (pub_incl_relids == NIL)
> + append_ddl_option(&buf, pretty, 4, "FOR TABLES IN SCHEMA");
> + else
> + appendStringInfoString(&buf, ", TABLES IN SCHEMA");

I think it's weird to mix the concern of pretty mode with the concern of
whether there are tables or not. I think this should just do
append_ddl_option, and if it results in one newline too many, then
that's too bad.

> + appendStringInfoChar(&buf, ';');
> + statements = lappend(statements, pstrdup(buf.data));
> + pfree(buf.data);
> +
> + /* OWNER */
> + if (!no_owner)
> + {
> + HeapTuple tup;
> + Form_pg_publication pubform;
> + char *owner;
> +
> + tup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(puboid));
> + if (!HeapTupleIsValid(tup))
> + elog(ERROR, "cache lookup failed for publication %u", puboid);
> + pubform = (Form_pg_publication) GETSTRUCT(tup);
> + owner = GetUserNameFromId(pubform->pubowner, false);
> + ReleaseSysCache(tup);
> +
> + initStringInfo(&buf);

It's ugly to do pfree(buf.data) and immediately follow with
initStringInfo(&buf). I would remove the pfree() and use
resetStringInfo() instead. Also, the comment here is not very nice.

Why do a SearchSysCacheExists1() at the top of the function, and later
do a real syscache fetch at the bottom? I think it would be smarter to
do the SearchSysCache1() at the top and keep the tuple until you need it
at the bottom instead. Which also suggests to me that doing
GetPublication (which does its own SearchSysCache!) is kinda lame, since
it doesn't do anything special. You could just do the
SearchSysCache1(), then process everything directly from the
Form_pg_publication struct without going through the Publication
abstraction (which, given what it does, which is to say nothing at all,
looks a pretty embarrasing one to me. We should fix that sometime, maybe.)

> + appendStringInfo(&buf, "ALTER PUBLICATION %s OWNER TO %s;",
> + quote_identifier(pub->name), quote_identifier(owner));
> + statements = lappend(statements, pstrdup(buf.data));
> + pfree(buf.data);
> + pfree(owner);

Pointless pfrees again.

> @@ -143,6 +142,10 @@ test: event_trigger_login
> # this test also uses event triggers, so likewise run it by itself
> test: fast_default
>
> +# run retail DDL tests last to avoid object name collisions and
> +# interference with previous tests.
> +test: publication_ddl

Hmm, see commit c529ee38b9eb.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Entristecido, Wutra (canción de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2026-06-09 21:31:21 Re: [PATCH v4] pg_stat_statements: Add last_execution_start column
Previous Message Nathan Bossart 2026-06-09 20:42:35 Re: Remove the refint contrib module (for v20)